Hariram Veeramani, Surendrabikram Thapa, Rajaraman Kanagasabai and Usman Naseem, UniteToModerate at DeHate: The Winning Approach for Segmentation-based Content Moderation with Vision-Text-Mask Modality Fused Large Multimodal Models. AAAI Conference on Artificial Intelligence (AAAI) Workshop (2024)
Abstract:
This paper presents a novel approach for detecting and masking hateful content in multimodal online
media, utilizing a blend of the NExT-Chat and UniFusion models. We demonstrate how this combination
effectively identifies and obscures harmful elements in images and text, addressing the critical need for a
safer digital environment. Our methodology leverages the strengths of both models, with NExT-Chat
providing initial mask generation through its innovative pix2emb method, and UniFusion enhancing
precision with its hierarchical fusion of visual and reference features. The effectiveness of our model is
evidenced by the first position in the DeHate 2024 challenge. This achievement not only showcases the
potential of our system in combating online hate but also sets a new benchmark in multimodal content
moderation.
License type:
Attribution 4.0 International (CC BY 4.0)
Funding Info:
This research / project is supported by the ID HTPO - NA
Grant Reference no. : C211418007