Diana Santos. "Comparação de corpora em português: alguns comentários", http://www.linguateca.pt/Diana/download/CCP.ps.
Translation of the title: Corpus comparison in Portuguese: some comments
In this paper I make some preliminary comments to corpus comparison in Portuguese,
In an introductory section, I report on some problems of encoding the six corpora dealt with, coming from widely different sources and formats. An interesting remark concerns (mis)encoding of footnotes.
- ortographic properties (capitalization, hyphenation, token form, etc.)
- the 100 most frequent words
- the 30 most frequent nouns
- proper noun frequency and structure
- the 15 most frequent one-word proper nouns
- perception verb frequencies
- the frequency of localizers (where and when)
Other documents related to Linguateca
Other publications by Diana Santos