Logo do repositório
 
Publicação

Capturing the narrative : deep learning models for comics sequences

dc.contributor.advisorPereira, Francisco José Batista
dc.contributor.authorMarouvo, Gonçalo Ventura Lourenço
dc.date.accessioned2025-03-17T16:30:39Z
dc.date.available2025-03-17T16:30:39Z
dc.date.issued2025-02-03
dc.description.abstractComics represent the complexway humans can communicate and expose ideas, which pose additional challenges for image-to-text deep learning models. In this project, we investigate howmultimodal deep learning architectures performin describing a comics vignette. We investigate howcurrent State-of-the-Art models (GIT and BLIP-2) are able to describe the narrative in 4-images comics sequence from a dataset we created. We find that some prompting can produce acceptable results. We also assess how to propagate information across the sequence’s images, by adding to prompts the previous outputs of the images from the same sequence. The results show limited improvements from this strategy. While the overall meaning of the predicted descriptions is close to the semantic space of the real descriptions, they are still far away from human-level descriptions. Therefore we propose several future experiments, where we highlight reinforcement learning to train a large language model as a policy function for prompt generation.pt_PT
dc.identifier.tid203894898pt_PT
dc.identifier.urihttp://hdl.handle.net/10400.26/57302
dc.language.isoengpt_PT
dc.subjectComics
dc.subjectComputer vision
dc.subjectImage captioning
dc.subjectMultimodal Deep Learning Models
dc.subjectPrompt engineering
dc.titleCapturing the narrative : deep learning models for comics sequencespt_PT
dc.typemaster thesis
dspace.entity.typePublication
rcaap.rightsopenAccesspt_PT
rcaap.typemasterThesispt_PT

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
Goncalo-Ventura-Lourenco-Marouvo.pdf
Tamanho:
8.93 MB
Formato:
Adobe Portable Document Format
Licença
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
license.txt
Tamanho:
1.85 KB
Formato:
Item-specific license agreed upon to submission
Descrição: