Rapport : a fact-based question answering system for portuguese

Rodrigues, Ricardo

Publicação

Rapport : a fact-based question answering system for portuguese

2017Tese de doutoramento

dc.contributor.advisor	Gomes, Paulo Jorge de Sousa
dc.contributor.advisor	Machado, Fernando Jorge Penousal Martins
dc.contributor.author	Rodrigues, Ricardo
dc.date.accessioned	2023-11-09T13:22:21Z
dc.date.available	2023-11-09T13:22:21Z
dc.date.issued	2017
dc.description.abstract	Question answering is one of the longest-standing problems in natural language processing. Although natural language interfaces for computer systems can be considered more common these days, the same still does not happen regarding access to specific textual information. Any full text search engine can easily retrieve documents containing user specified or closely related terms, however it is typically unable to answer user questions with small passages or short answers. The problem with question answering is that text is hard to process, due to its syntactic structure and, to a higher degree, to its semantic contents. At the sentence level, although the syntactic aspects of natural language have well known rules, the size and complexity of a sentence may make it difficult to analyze its structure. Furthermore, semantic aspects are still arduous to address, with text ambiguity being one of the hardest tasks to handle. There is also the need to correctly process the question in order to define its target, and then select and process the answers found in a text. Additionally, the selected text that may yield the answer to a given question must be further processed in order to present just a passage instead of the full text. These issues take also longer to address in languages other than English, as is the case of Portuguese, that have a lot less people working on them. This work focuses on question answering for Portuguese. In other words, our field of interest is in the presentation of short answers, passages, and possibly full sentences, but not whole documents, to questions formulated using natural language. For that purpose, we have developed a system, RAPPORT, built upon the use of open information extraction techniques for extracting triples, so called facts, characterizing information on text files, and then storing and using them for answering user queries done in natural language. These facts, in the form of subject, predicate and object, alongside other metadata, constitute the basis of the answers presented by the system. Facts work both by storing short and direct information found in a text, typically entity related information, and by containing in themselves the answers to the questions already in the form of small passages. As for the results, although there is margin for improvement, they are a tangible proof of the adequacy of our approach and its different modules for storing information and retrieving answers in question answering systems. In the process, in addition to contributing with a new approach to question answering for Portuguese, and validating the application of open information extraction to question answering, we have developed a set of tools that has been used in other natural language processing related works, such as is the case of a lemmatizer, LEMPORT, which was built from scratch, and has a high accuracy. Many of these tools result from the improvement of those found in the Apache OpenNLP toolkit, by pre-processing their input, post-processing their output, or both, and by training models for use in those tools or other, such as MaltParser. Other tools include the creation of interfaces for other resources containing, for example, synonyms, hypernyms, hyponyms, or the creation of lists of, for instance, relations between verbs and agents, using rules.	pt_PT
dc.identifier.uri	http://hdl.handle.net/10400.26/47913
dc.language.iso	eng	pt_PT
dc.title	Rapport : a fact-based question answering system for portuguese	pt_PT
dc.type	doctoral thesis
dspace.entity.type	Publication
person.familyName	Rodrigues
person.givenName	Ricardo
person.identifier.ciencia-id	D31C-FB4A-FEAA
person.identifier.orcid	0000-0002-6262-7920
rcaap.rights	openAccess	pt_PT
rcaap.type	masterThesis	pt_PT
relation.isAuthorOfPublication	c64ccf7c-eca2-43cf-a4a2-78e684499c00
relation.isAuthorOfPublication.latestForDiscovery	c64ccf7c-eca2-43cf-a4a2-78e684499c00
thesis.degree.grantor	Instituto Politécnico de Coimbra
thesis.degree.name	Doutoramento em Ciências e Tecnologias da Informação	pt_PT

Ficheiros

Principais

A mostrar 1 - 1 de 1

Nome:: Tese.pdf
Tamanho:: 2.04 MB
Formato:: Adobe Portable Document Format

Ver/Abrir

Licença

A mostrar 1 - 1 de 1

Nome:: license.txt
Tamanho:: 1.85 KB
Formato:: Item-specific license agreed upon to submission
Descrição:

Ver/Abrir

Coleções

ESEC - Teses de Doutoramento