Dinh N., Siying L., Vicky S., Yue W., Jack H., ZhaoYong L., Ryan L., Karianto L. "A Spatiotemporal Excitation Classifier Head for Action Recognition Applications", 2024 IEEE Intl. Conf. Artificial Intelligence (CAI2024)
Abstract:
Transfer learning is a convenient approach to quickly adapt state-of-the-art deep learning models to specific
applications with small datasets. Typically, network backbones are fixed, and only the last layer as a classifier is modified to match with a new number of targeted classes. The performance of the models is then limited by model-predefined structures. In this research, we are going to overcome this constraint by studying
the effect of the common classifier layer and then proposing an extension classifier module for action recognition applications. By focusing on local spatiotemporal representation of deep features
encoded by pre-trained models, we exploit further this local representation in the proposed classifier to enrich deep features representation. In addition, the extension classifier was designed
so that it can plug on top of any image or video encoders to perform action recognition. A public dataset TinyVIRAT2 and two private datasets Scratch and AtomicA were adopted for evaluation and the experiments show significant performance improvement caused by the proposed extension classifier.
License type:
Publisher Copyright
Funding Info:
This research is funded by the project Tell-Tail Indicator that invents the vision-based system identifying suspicious commuters based on their behaviors.