LBP inspired efficient deep convolutional neural networks for visual representation learning

Su, Zhuo

LBP inspired efficient deep convolutional neural networks for visual representation learning

Su, Zhuo (2023-10-09)

Avaa tiedosto

isbn978-952-62-3809-8.pdf (15.10Mt)

isbn978-952-62-3809-8_meta.xml (112.3Kt)

isbn978-952-62-3809-8_solr.xml (96.23Kt)

Lataukset:

Su, Zhuo

University of Oulu

09.10.2023

Tämä Kohde on tekijänoikeuden ja/tai lähioikeuksien suojaama. Voit käyttää Kohdetta käyttöösi sovellettavan tekijänoikeutta ja lähioikeuksia koskevan lainsäädännön sallimilla tavoilla. Muunlaista käyttöä varten tarvitset oikeudenhaltijoiden luvan.

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:ISBN:9789526238098

Kuvaus

Academic dissertation to be presented with the assent of the Doctoral Programme Committee of Information Technology and Electrical Engineering of the University of Oulu for public defence in the OP auditorium (L10), Linnanmaa, on 16 October 2023, at 12 noon

Tiivistelmä

Abstract

In the past decades, deep neural networks (DNNs) have revolutionized the computer vision community with their significant success in a wide range of computer vision tasks. Recent work has focused intensely on accuracy, which has resulted in a large number of huge and complex models designed in the community. However, with the ubiquitous use of edge devices like mobile phones, robots, and embedded systems, efficiency is gradually becoming more and more important for modern computer vision models. In computer vision, the quality of feature representation learning directly determines the quality of the whole machine learning model. The core challenge is to develop feature representation learning algorithms both effectively and efficiently.

In this thesis, we put our efforts into the following matters to meet the challenge. On the one hand, we take the merit of traditional local binary pattern (LBP) descriptors of being computationally simple and efficient, and propose improvement in the learnability of LBP to extract more discriminative features. On the other hand, taking advantage of DNNs of high representational capacity, we target building compact DNN modules with less computational cost and model size. These two aspects are either separately developed or combined, and both are considered in this thesis.

We start by extending traditional LBP to learnable descriptors, allowing the new descriptors to be learned from the data rather than handcrafted. Based on that, our model obtains a better trade-off than earlier LBP variants including distinctiveness, computational cost, and robustness. Next, we propose two novel types of convolutions that combine LBP and the convolution operation. The new convolutions enjoy the following benefits: capturing higher-order local differential information, being computationally efficient, and being able to be integrated well into existing DNNs. Then, we propose an efficient convolutional neural network (CNN) module that benefits from group convolution and dynamic execution. It shares the efficiency of the standard group convolution without losing representational ability. Finally, we develop a novel binary DNN module for robust point cloud analysis. The proposed point cloud models achieve both running efficiencies through network binarization and rotation invariance at the same time.

Tiivistelmä

Viime vuosikymmeninä syvät neuroverkot ovat mullistaneet konenäköä suurella menestyksellä useissa eri tehtävissä. Viimeaikaisin tutkimus on keskittynyt menetelmien tarkkuuteen, mikä on johtanut suuren määrään valtavan kokoisia ja kompleksisia malleja. Kaikella läsnä olevalla tekniikalla, kuten puhelimilla, roboteilla ja sulautetuilla järjestelmillä konenäkö on kasvavassa määrin tärkeämpää ja täten mallien tehokkuus on myös tärkeämpää. Konenäössä piirteiden oppiminen määrittää suoraan koko konenäkö mallin laadun. Keskeinen haaste on kehittää piirteiden oppimisalgoritmeja tehokkaasti.

Tässä väitöskirjassa esitetään seuraavia ratkaisuja mallien tehokkuuden ongelmaan. Ensiksi parannetaan laskennallisesti tehokasta ja yksinkertaista paikallista binäärikuva -menetelmää lisäämällä sen piirteiden määrää. Toiseksi hyödynnetään suurta neuroverkkojen piirteiden kapasiteettia kehittämällä laskennallisesti tehokkaampia ja pienempiä moduuleja. Kumpiakin tekniikkoja käytetään erikseen ja yhdessä tässä väitöskirjassa.

Perinteisestä paikallisesta binäärikuvio -menetelmästä tehdään oppiva, jolloin uusia piirteitä voidaan oppia datasta, sen sijaan että ne määriteltäisiin algoritmillisesti. Uusi kehitetty oppiva versio on laskennallisesti tehokkaampi, robustimpi ja erottelevaisempi. Seuraavaksi esitellään tekniikka, joka yhdistää paikallisen binäärikuvion ja konvoluution. Kehitetty konvoluutio pystyy irrottamaan korkeamman asteen paikallista informaatiota, se on laskennallisesti tehokas ja se voidaan integroida olemassa oleviin neuroverkkoihin vaivattomasti. Sen jälkeen esitellään konvoluutioneuroverkon moduuli, joka käyttää hyväkseen ryhmäkonvoluutiota ja dynaamista suoritusta. Moduuli pitää normaalin konvoluution piirteidenirrotus kyvyn ollen kuitenkin yhtä tehokas ryhmäkonvoluution kanssa laskennallisesti. Lopuksi esitetään binäärineuroverkko moduuli robustiin pistepilven analyysiin. Kehitetty moduuli parantaa laskennallista tehokkuutta hyödyntämällä sekä binarisointia että pyörimis-invarianttia konvoluutiota.

Original papers

Original papers are not included in the electronic version of the dissertation.

Su, Z., Pietikäinen, M., & Liu, L. (2019). BIRD: Learning binary and illumination robust descriptor for face recognition. In 30th British Machine Visison Conference : BMVC 2019, 1–12.
Self-archived version
Su, Z., Fang, L., Kang, W., Hu, D., Pietikäinen, M., & Liu, L. (2020). Dynamic group convolution for accelerating convolutional neural networks. In Computer Vision – ECCV 2020, 138–155. https://doi.org/10.1007/978-3-030-58539-6_9
Self-archived version
Su, Z., Liu, W., Yu, Z., Hu, D., Liao, Q., Tian, Q., Pietikainen, M., & Liu, L. (2021). Pixel difference networks for efficient edge detection. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 5097–5107. https://doi.org/10.1109/ICCV48922.2021.00507
Self-archived version
Su, Z., Welling, M., Pietikäinen, M., & Liu, L. (2022). SVNet: Where SO(3) equivariance meets binarization on point cloud representation. In 2022 International Conference on 3D Vision (3DV), 547–556. https://doi.org/10.1109/3DV57658.2022.00084
Self-archived version
Su, Z., Zhang, J., Wang, L., Zhang, H., Liu, Z., Pietikäinen, M., & Liu, L. (2023). Lightweight pixel difference networks for efficient visual representation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence. Advance online publication. https://doi.org/10.1109/TPAMI.2023.3300513
Su, Z., Müller, M., Wofk, D., Pietikäinen, M., & Liu, L. (2023). Spatial and temporal difference network for real-time salient object detection. Manuscript submitted for publication.

Osajulkaisut

Osajulkaisut eivät sisälly väitöskirjan elektroniseen versioon.

Su, Z., Pietikäinen, M., & Liu, L. (2019). BIRD: Learning binary and illumination robust descriptor for face recognition. In 30th British Machine Visison Conference : BMVC 2019, 1–12.
Rinnakkaistallennettu versio
Su, Z., Fang, L., Kang, W., Hu, D., Pietikäinen, M., & Liu, L. (2020). Dynamic group convolution for accelerating convolutional neural networks. In Computer Vision – ECCV 2020, 138–155. https://doi.org/10.1007/978-3-030-58539-6_9
Rinnakkaistallennettu versio
Su, Z., Liu, W., Yu, Z., Hu, D., Liao, Q., Tian, Q., Pietikainen, M., & Liu, L. (2021). Pixel difference networks for efficient edge detection. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 5097–5107. https://doi.org/10.1109/ICCV48922.2021.00507
Rinnakkaistallennettu versio
Su, Z., Welling, M., Pietikäinen, M., & Liu, L. (2022). SVNet: Where SO(3) equivariance meets binarization on point cloud representation. In 2022 International Conference on 3D Vision (3DV), 547–556. https://doi.org/10.1109/3DV57658.2022.00084
Rinnakkaistallennettu versio
Su, Z., Zhang, J., Wang, L., Zhang, H., Liu, Z., Pietikäinen, M., & Liu, L. (2023). Lightweight pixel difference networks for efficient visual representation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence. Advance online publication. https://doi.org/10.1109/TPAMI.2023.3300513
Su, Z., Müller, M., Wofk, D., Pietikäinen, M., & Liu, L. (2023). Spatial and temporal difference network for real-time salient object detection. Manuscript submitted for publication.

Kokoelmat

Avoin saatavuus [32049]