Logo do repositório
 
A carregar...
Miniatura
Publicação

Capturing the narrative : deep learning models for comics sequences

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
Goncalo-Ventura-Lourenco-Marouvo.pdf8.93 MBAdobe PDF Ver/Abrir

Resumo(s)

Comics represent the complexway humans can communicate and expose ideas, which pose additional challenges for image-to-text deep learning models. In this project, we investigate howmultimodal deep learning architectures performin describing a comics vignette. We investigate howcurrent State-of-the-Art models (GIT and BLIP-2) are able to describe the narrative in 4-images comics sequence from a dataset we created. We find that some prompting can produce acceptable results. We also assess how to propagate information across the sequence’s images, by adding to prompts the previous outputs of the images from the same sequence. The results show limited improvements from this strategy. While the overall meaning of the predicted descriptions is close to the semantic space of the real descriptions, they are still far away from human-level descriptions. Therefore we propose several future experiments, where we highlight reinforcement learning to train a large language model as a policy function for prompt generation.

Descrição

Palavras-chave

Comics Computer vision Image captioning Multimodal Deep Learning Models Prompt engineering

Contexto Educativo

Citação

Projetos de investigação

Unidades organizacionais

Fascículo