Authors
Advisor(s)
Abstract(s)
A maior parte da faturação continua a ser feita com recurso a papel. A
digitalização do processamento deste tipo de documentos promete apresentar
vantagens financeiras e de qualidade. Pretendeu-se então estudar a possibilidade
de desenvolver uma ferramenta que, recorrendo a Machine Learning, permita
auxiliar na identificação e categorização dos campos presentes num documento.
Como ponto de partida do desenvolvimento estudou-se o estado da arte atual de
vários campos de estudo, como Machine Learning, Optical Character Recognition
e tarefas de classificação, partindo-se em seguida para uma análise dos possíveis
utilizadores, levantamento de requisitos e modelação do sistema a desenvolver.
Recolhidos os dados que permitiram o treino de um modelo de Machine Learning,
desenvolveu-se uma aplicação capaz de gerir documentos e processá-los,
permitindo aos seus utilizadores validarem os dados inferidos, guardarem os
resultados deste processamento e exportarem vários resultados
simultaneamente. O modelo final apresenta uma percentagem de acerto de 69%
quando tendo em conta resultados exata e parcialmente corretos, com uma
Distância de Levenshtein média de 4, servindo assim como um auxílio ao
processamento de faturas. Finalmente são apresentadas algumas propostas de
trabalho futuro e áreas que poderão beneficiar de ferramentas que utilizem a
mesma tecnologia da aplicação desenvolvida.
Most invoicing today continues to use paper. The digitization of this type of document processing promises to present financial and quality advantages. It was then intended to study the possibility of developing a tool that, using Machine Learning, helped identify and categorize the fields present in an invoice. As a starting point for the development, the state of the art of various fields of study was studied, such as Machine Learning, Optical Character Recognition, and classification tasks, followed by an analysis of possible users, requirements gathering, and modeling of the system to be developed. Having collected the data that allowed the training of a Machine Learning model, an application was developed capable of managing documents and processing them, allowing its users to validate the inferred data, save the results of this processing and export several results simultaneously. The final model has an accuracy rate of 69% when considering exact and partially correct results, with an average Levenshtein Distance of 4, thus aiding invoice processing. Finally, some proposals for future work and areas that could benefit from tools that use the same technology as the developed application are presented.
Most invoicing today continues to use paper. The digitization of this type of document processing promises to present financial and quality advantages. It was then intended to study the possibility of developing a tool that, using Machine Learning, helped identify and categorize the fields present in an invoice. As a starting point for the development, the state of the art of various fields of study was studied, such as Machine Learning, Optical Character Recognition, and classification tasks, followed by an analysis of possible users, requirements gathering, and modeling of the system to be developed. Having collected the data that allowed the training of a Machine Learning model, an application was developed capable of managing documents and processing them, allowing its users to validate the inferred data, save the results of this processing and export several results simultaneously. The final model has an accuracy rate of 69% when considering exact and partially correct results, with an average Levenshtein Distance of 4, thus aiding invoice processing. Finally, some proposals for future work and areas that could benefit from tools that use the same technology as the developed application are presented.
Description
Keywords
Aprendizagem de Máquina Reconhecimento de Entidades Nomeadas Faturação Eletrónica Desenvolvimento Web Machine Learning Named Entity Recognition E-Invoicing Web Development