Improving NLTK for Processing Portuguese

Ferreira, João; Oliveira, Hugo Gonçalo; Rodrigues, Ricardo

http://hdl.handle.net/10400.26/61201

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
OASIcs.SLATE.2019.18.pdf		450.74 KB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Ferreira, João

Oliveira, Hugo Gonçalo

Rodrigues, Ricardo

Resumo(s)

Python has a growing community of users, especially in the AI and ML fields. Yet, Computational Processing of Portuguese in this programming language is limited, in both available tools and results. This paper describes NLPyPort, a NLP pipeline in Python, primarily based on NLTK, and focused on Portuguese. It is mostly assembled from pre-existent resources or their adaptations, but improves over the performance of existing alternatives in Python, namely in the tasks of tokenization, PoS tagging, lemmatization and NER.

Palavras-chave

NLP Tokenization PoS tagging Lemmatization Named Entity Recognition

URI

http://hdl.handle.net/10400.26/61201

Citação

João Ferreira, Hugo Gonçalo Oliveira, and Ricardo Rodrigues. Improving NLTK for Processing Portuguese. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 18:1-18:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/OASIcs.SLATE.2019.18

DOI

10.4230/OASIcs.SLATE.2019.18

Coleções

ESEC - Comunicações em conferências e congressos

Licença CC

cclicense-by

Métricas Alternativas

Ver registo completo