Name: | Description: | Size: | Format: | |
---|---|---|---|---|
13.58 MB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
Neste trabalho é feito o seguimento de múltiplos objetos (Multiple Object Tracking - MOT), de forma automática, a partir de vídeos gravados por uma câmera integrada num veículo aéreo não tripulado (Unmanned Aerial Vehicle - UAV). Para tal, fez-se uma pesquisa bibliográfica sobre os métodos mais usados para esta tarefa, e optou-se por usar o sistema ByteTrack para fazer o processamento. O sistema ByteTrack é de utilização muito simples, mas internamente usa o detector YOLOX para fazer o reconhecimento de alvos em cada imagem do video, seguido de um filtro de Kalman e várias métricas de distância para fazer a associação dos alvos nas diferentes imagens. Um dos aspectos críticos do sistema ByteTrack é o detector (YOLOX) que é uma rede neuronal cujos pesos são ajustados treinando o sistema com diversos conjuntos de dados.
Numa primeira instância o sistema ByteTrack foi inicializado com parâmetros obtidos no treino com o conjunto de dados MOTChallange e MS COCO. Por forma a ajustar os pesos à realidade a ser testada (imagens de pessoas tiradas a partir de UAV), foi realizado treino adicional (trasnsfer learning e fine-tuning) com mais 3 conjuntos de dados. Esses dados incluíram imagens genéricas de pessoas (conjunto VISDRONE), imagens gravadas durante exercícios militares com cadetes na península de Troia e com fuzileiros da Força Nacional Destacada na Lituânia, e um conjunto de imagens "open source" com militares retirada de um site da internet.
Finalmente, o desempenho do sistema de seguimento foi testado em imagens obtidas em ambiente militar na Ucrânia e na República Centro-Africana (RCA). O treino com estes conjuntos de dados foi feito usando o CooLab (da Google). Com as melhorias introduzidas neste trabalho, o desempenho nas imagens de teste foi bastante melhor que o do sistema genérico ByteTrack (inicializado com apenas MS COCO), abrindo perspectivas de poder ser utilizado operacionalmente no futuro.
This work presents the automatic tracking of multiple objects (Multiple Object Tracking - MOT) from videos recorded by a camera integrated into an Unmanned Aerial Vehicle (UAV). For this purpose, a bibliographic research on the most used methods for this task was conducted, and the ByteTrack system was chosen for processing. The ByteTrack system is straightforward to use, but internally it employs the YOLOX detector for target recognition in each video frame, followed by a Kalman filter and various distance metrics for target association across frames. One of the critical aspects of the ByteTrack system is the detector (YOLOX), which is a neural network whose weights are adjusted by training the system with various datasets. Initially, the ByteTrack system was initialized with parameters obtained from training with the MOTChallenge and MS COCO datasets. To adapt the weights to the reality being tested (images of people taken from UAVs), additional training (transfer learning and fine-tuning) was performed with three more datasets. These datasets included generic images of people (VISDRONE dataset), images recorded during military exercises with cadets in the Troia Peninsula and with marines from the Detached National Force in Lithuania, and a set of "open source"images with military personnel taken from an internet site. Finally, the performance of the tracking system was tested on images obtained in military environments in Ukraine and the Central African Republic (CAR). Training with these datasets was performed using Google’s Colab. With the improvements introduced in this work, the performance on test images was significantly better than that of the generic ByteTrack system (initialized with only MS COCO), opening perspectives for potential operational use in the future.
This work presents the automatic tracking of multiple objects (Multiple Object Tracking - MOT) from videos recorded by a camera integrated into an Unmanned Aerial Vehicle (UAV). For this purpose, a bibliographic research on the most used methods for this task was conducted, and the ByteTrack system was chosen for processing. The ByteTrack system is straightforward to use, but internally it employs the YOLOX detector for target recognition in each video frame, followed by a Kalman filter and various distance metrics for target association across frames. One of the critical aspects of the ByteTrack system is the detector (YOLOX), which is a neural network whose weights are adjusted by training the system with various datasets. Initially, the ByteTrack system was initialized with parameters obtained from training with the MOTChallenge and MS COCO datasets. To adapt the weights to the reality being tested (images of people taken from UAVs), additional training (transfer learning and fine-tuning) was performed with three more datasets. These datasets included generic images of people (VISDRONE dataset), images recorded during military exercises with cadets in the Troia Peninsula and with marines from the Detached National Force in Lithuania, and a set of "open source"images with military personnel taken from an internet site. Finally, the performance of the tracking system was tested on images obtained in military environments in Ukraine and the Central African Republic (CAR). Training with these datasets was performed using Google’s Colab. With the improvements introduced in this work, the performance on test images was significantly better than that of the generic ByteTrack system (initialized with only MS COCO), opening perspectives for potential operational use in the future.
Description
Keywords
Seguimento (Tracking) Seguimento de Múltiplos Objectos (MOT) Reconhecimento de Imagens UAVs VisDrone