Adaptive Masked Autoencoder Transformer for image classification

Page view(s)
6
Checked on Sep 18, 2024
Adaptive Masked Autoencoder Transformer for image classification
Title:
Adaptive Masked Autoencoder Transformer for image classification
Journal Title:
Applied Soft Computing
Keywords:
Publication Date:
09 July 2024
Citation:
Chen, X., Liu, C., Hu, P., Lin, J., Gong, Y., Chen, Y., Peng, D., & Geng, X. (2024). Adaptive Masked Autoencoder Transformer for image classification. Applied Soft Computing, 164, 111958. https://doi.org/10.1016/j.asoc.2024.111958
Abstract:
Vision Transformers (ViTs) have exhibited exceptional performance across a broad spectrum of visual tasks. Nonetheless, their computational requirements often surpass those of prevailing CNN-based models. Token sparsity techniques have been employed as a means to alleviate this issue. Regrettably, these techniques often result in the loss of semantic information and subsequent deterioration in performance. In order to address these challenges, we propose the Adaptive Masked Autoencoder Transformer (AMAT), a masked image modeling-based method. AMAT integrates a novel adaptive masking mechanism and a training objective function for both pre-training and fine-tuning stages. Our primary objective is to reduce the complexity of Vision Transformer models while concurrently enhancing their final accuracy. Through experiments conducted on the ILSVRC-2012 dataset, our proposed method surpasses the original ViT by achieving up to 40% FLOPs savings. Moreover, AMAT outperforms the efficient DynamicViT model by 0.1% while saving 4% FLOPs. Furthermore, on the Places365 dataset, AMAT achieves a 0.3% accuracy loss while saving 21% FLOPs compared to MAE. These findings effectively demonstrate the efficacy of AMAT in mitigating computational complexity while maintaining a high level of accuracy.
License type:
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Funding Info:
This research / project is supported by the A*STAR - AME Programmatic Funds
Grant Reference no. : A1892b0026

This research / project is supported by the A*STAR - MTC Programmatic Funds
Grant Reference no. : M23L7b0021
Description:
ISSN:
1568-4946
Files uploaded:

File Size Format Action
amat-asoc.pdf 1.83 MB PDF Request a copy