Liu, C., Chen, X., Lin, J., Hu, P., Wang, J., & Geng, X. (2024). Evolving masked low-rank transformer for long text understanding. Applied Soft Computing, 152, 111207. https://doi.org/10.1016/j.asoc.2023.111207
Abstract:
Long sequence text processing is time-consuming owing to the ultra-large-scale self-attention computing. Recent advances demonstrate the attention in transformer can be accelerated by redundancy removal, and
there are various sparse variants for attention in large sequences are proposed, which leads to state-of-the-art
performance on language and vision task. Low-rank method achieve outstanding success in the field of efficient
transformer. The dynamic token sparsification is efficiently time-saving and cost-saving, which can be easily
extended to prune redundant spans and to yield semantic features. Evolutionary algorithm is attractive for
selecting hyperparameter which is of significant importance in effectiveness. Motivated by these works, we
propose an efficient transformers model, termed EMLT, to alleviate time and cost without sacrificing the
accuracy. EMLT effectively combines strengths of Low-rank transformers, dynamic token sparsification and
evolutionary algorithm to ulteriorly cut redundant token and meanwhile maintains the original precision,
which can achieve a linear memory and time complexity. We compress transformer in three stages. Firstly,
sliding window is validated as local attention to capture fine-grained dependency semantics. After that, low-rank approximation of attention matrix is applied as global attention to store long-range dependency semantics, and aggregated with local attention. On this basis, we consistently prune redundant token in accordance with importance score to further sparse the attention operation. Finally, Evolutionary algorithm is utilized to optimize the hyper-parameters of every layer. The results of comprehensive experiments and analysis show that our method can rival others on accuracy, and outperforms others on efficiency by a large margin in terms of the computational complexity.
License type:
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Funding Info:
This research / project is supported by the A*STAR - AME Programmatic Funds (Hardware-Software Co-optimisation for Deep Learning )
Grant Reference no. : A1892b0026
This research / project is supported by the A*STAR - AME Programmatic Funds (Accelerating Homomorphic Encryption)
Grant Reference no. : A19E3b0099