Temporal self-ensembling teacher for semi-supervised object detection
Chen, Cong; Dong, Shouyang; Tian, Ye; Cao, Kunlin; Liu, Li; Guo, Yuanhao (2022-08-09)
C. Chen, S. Dong, Y. Tian, K. Cao, L. Liu and Y. Guo, "Temporal Self-Ensembling Teacher for Semi-Supervised Object Detection," in IEEE Transactions on Multimedia, vol. 24, pp. 3679-3692, 2022, doi: 10.1109/TMM.2021.3105807
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
https://rightsstatements.org/vocab/InC/1.0/
https://urn.fi/URN:NBN:fi-fe2023040535144
Tiivistelmä
Abstract
This paper focuses on the semi-supervised object detection (SSOD) which makes good use of unlabeled data to boost performance. We face the following obstacles when adapting the knowledge distillation (KD) framework in SSOD. (1) The teacher model serves a dual role as a teacher and a student, such that the teacher predictions on unlabeled images may limit the upper bound of the student. (2) The data imbalance issue caused by the large quantity of consistent predictions between the teacher and student hinders an efficient knowledge transfer between them. To mitigate these issues, we propose a novel SSOD model called Temporal Self-Ensembling Teacher (TSET). Our teacher model ensembles its temporal predictions for unlabeled images under stochastic perturbations. Then, our teacher model ensembles its model weights with those of the student model by an exponential moving average. These ensembling strategies ensure data and model diversity, and lead to better teacher predictions for unlabeled images. In addition, we adapt the focal loss to formulate the consistency loss for handling the data imbalance issue. Together with a thresholding method, the focal loss automatically reweights the inconsistent predictions, which preserves the knowledge for difficult objects to detect in the unlabeled images. The mAP of our model reaches 80.73% and 40.52% on the VOC2007 test set and the COCO2014 minival5k set, respectively, and outperforms a strong fully supervised detector by 2.37% and 1.49%, respectively. Furthermore, the mAP of our model (80.73%) sets a new state-of-the-art performance in SSOD on the VOC2007 test set.
Kokoelmat
- Avoin saatavuus [32026]