Evaluation of pre-trained object detection models for the use in the SURE Project
Laitila, Gamze (2021)
Laitila, Gamze
2021
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-202105057192
https://urn.fi/URN:NBN:fi:amk-202105057192
Tiivistelmä
The main purpose of this study was to investigate as many object detection models as possible in order to evaluate their efficiency and determine a suitable model to be used in the SURE project. This model would have then processed images taken by drones outside, which contain mostly vehicles and crowds of people.
Out of 41 pre-trained object detection models, 17 were selected for evaluation. Over 1000 photographs taken by the drones were received as test data and 240 of them were chosen to be processed. In these photographs, 1754 objects were annotated using an image annotation tool. Detection results and actual annotation results of the images were then compared, and evaluation metrics were calculated using a code written in Python.
As a result of this process, it was found that EfficientDet D7 1536x1536, EfficientDet D6 1280x1280 and EfficientDet D5 1280x1280 were the three top ranking models in terms of accuracy whereas CenterNet Resnet50 V1 FPN Keypoints 512x512, CenterNet Resnet50 V2 Keypoints 512x512 and Faster R-CNN ResNet101 V1 800x1333 were the three top ranking models in terms of image processing speed.
The findings indicate that models that were proven to be the most accurate in international competitions were not necessarily useful for the SURE project. One-stage models could later be investigated, for example YOLO and RetinaNet, but this study suggests the use of the model EfficientDet D6 1280x1280 for efficiency in accuracy and Faster R-CNN Inception ResNet V2 1024x1024 for efficiency in image processing speed.
Out of 41 pre-trained object detection models, 17 were selected for evaluation. Over 1000 photographs taken by the drones were received as test data and 240 of them were chosen to be processed. In these photographs, 1754 objects were annotated using an image annotation tool. Detection results and actual annotation results of the images were then compared, and evaluation metrics were calculated using a code written in Python.
As a result of this process, it was found that EfficientDet D7 1536x1536, EfficientDet D6 1280x1280 and EfficientDet D5 1280x1280 were the three top ranking models in terms of accuracy whereas CenterNet Resnet50 V1 FPN Keypoints 512x512, CenterNet Resnet50 V2 Keypoints 512x512 and Faster R-CNN ResNet101 V1 800x1333 were the three top ranking models in terms of image processing speed.
The findings indicate that models that were proven to be the most accurate in international competitions were not necessarily useful for the SURE project. One-stage models could later be investigated, for example YOLO and RetinaNet, but this study suggests the use of the model EfficientDet D6 1280x1280 for efficiency in accuracy and Faster R-CNN Inception ResNet V2 1024x1024 for efficiency in image processing speed.