Formento, B., Ng, S.-K., & Foo, C.-S. (2021). Special Symbol Attacks On NLP Systems. 2021 International Joint Conference on Neural Networks (IJCNN). doi:10.1109/ijcnn52387.2021.9534254
Adversarial attacks/perturbations are becoming important for NLP research, as it has been shown recently that text-attacking adversaries can degrade an NLP model's performance without the victim's knowledge. This has far-reaching implications, especially when an NLP system is deployed in critical applications such as health or finance. In fact, the robustness of state-of-the-art models such as NLP transformers have increasingly been scrutinised due to their vulnerability against adversarial perturbations such as TextFooler and BERT-Attack. These methods, however, focus on changing words, which at times, ruins the readability and semantics of the sample. This paper introduces Special Symbol Text Attacks ‘SSTA’, a technique to improve the performance of language adversarial perturbations using special symbols that have downstream task information associated with them even though that should not have been the case. Our tests show that introducing such symbols which are meaningless to a human within a sentence, can perturb the sample in a particular direction. When this technique is used with TextFooler, which is a recent benchmark for the creation of NLP adversaries, through the TextAttack framework, it can improve all main evaluation metrics on three sentiment classification tasks and fake news detection. A simple, novel and symbol-specific adversarial learning technique is then introduced to reduce the influence of such special symbols.
There was no specific funding for the research done