Evolving masked low-rank transformer for long text understanding

Page view(s)
11
Checked on Oct 23, 2024
Evolving masked low-rank transformer for long text understanding
Title:
Evolving masked low-rank transformer for long text understanding
Journal Title:
Applied Soft Computing
Keywords:
Publication Date:
29 December 2023
Citation:
Liu, C., Chen, X., Lin, J., Hu, P., Wang, J., & Geng, X. (2024). Evolving masked low-rank transformer for long text understanding. Applied Soft Computing, 152, 111207. https://doi.org/10.1016/j.asoc.2023.111207
Abstract:
Long sequence text processing is time-consuming owing to the ultra-large-scale self-attention computing. Recent advances demonstrate the attention in transformer can be accelerated by redundancy removal, and there are various sparse variants for attention in large sequences are proposed, which leads to state-of-the-art performance on language and vision task. Low-rank method achieve outstanding success in the field of efficient transformer. The dynamic token sparsification is efficiently time-saving and cost-saving, which can be easily extended to prune redundant spans and to yield semantic features. Evolutionary algorithm is attractive for selecting hyperparameter which is of significant importance in effectiveness. Motivated by these works, we propose an efficient transformers model, termed EMLT, to alleviate time and cost without sacrificing the accuracy. EMLT effectively combines strengths of Low-rank transformers, dynamic token sparsification and evolutionary algorithm to ulteriorly cut redundant token and meanwhile maintains the original precision, which can achieve a linear memory and time complexity. We compress transformer in three stages. Firstly, sliding window is validated as local attention to capture fine-grained dependency semantics. After that, low-rank approximation of attention matrix is applied as global attention to store long-range dependency semantics, and aggregated with local attention. On this basis, we consistently prune redundant token in accordance with importance score to further sparse the attention operation. Finally, Evolutionary algorithm is utilized to optimize the hyper-parameters of every layer. The results of comprehensive experiments and analysis show that our method can rival others on accuracy, and outperforms others on efficiency by a large margin in terms of the computational complexity.
License type:
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Funding Info:
This research / project is supported by the A*STAR - AME Programmatic Funds (Hardware-Software Co-optimisation for Deep Learning )
Grant Reference no. : A1892b0026

This research / project is supported by the A*STAR - AME Programmatic Funds (Accelerating Homomorphic Encryption)
Grant Reference no. : A19E3b0099
Description:
ISSN:
1568-4946
Files uploaded:

File Size Format Action
evolving-masked-low-rank-transformer-for-long-text-understanding-all-black-1.pdf 1.29 MB PDF Request a copy