| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 446.2 KB | Adobe PDF |
Orientador(es)
Resumo(s)
Although lemmatization is a very common subtask in many natural language processing tasks, there is a lack of available true cross-platform lemmatization tools specifically targeted for Portuguese, namely for integration in projects developed in Java. To address this issue, we have developed a lemmatizer, initially just for our own use, but which we have decided to make publicly available. The lemmatizer, presented in this document, yields an overall accuracy over 98% when compared against a manually revised corpus.
Descrição
Palavras-chave
lemmatization normalization rules lexicon
