The Winning Approach for Segmentation-based Content Moderation with Vision-Text-Mask Modality Fused Large Multimodal Models

Page view(s)
7
Checked on Dec 20, 2024
The Winning Approach for Segmentation-based Content Moderation with Vision-Text-Mask Modality Fused Large Multimodal Models
Title:
The Winning Approach for Segmentation-based Content Moderation with Vision-Text-Mask Modality Fused Large Multimodal Models
Journal Title:
AAAI Conference on Artificial Intelligence (AAAI) Workshop (2024)
DOI:
Publication Date:
01 December 2024
Citation:
Hariram Veeramani, Surendrabikram Thapa, Rajaraman Kanagasabai and Usman Naseem, UniteToModerate at DeHate: The Winning Approach for Segmentation-based Content Moderation with Vision-Text-Mask Modality Fused Large Multimodal Models. AAAI Conference on Artificial Intelligence (AAAI) Workshop (2024)
Abstract:
This paper presents a novel approach for detecting and masking hateful content in multimodal online media, utilizing a blend of the NExT-Chat and UniFusion models. We demonstrate how this combination effectively identifies and obscures harmful elements in images and text, addressing the critical need for a safer digital environment. Our methodology leverages the strengths of both models, with NExT-Chat providing initial mask generation through its innovative pix2emb method, and UniFusion enhancing precision with its hierarchical fusion of visual and reference features. The effectiveness of our model is evidenced by the first position in the DeHate 2024 challenge. This achievement not only showcases the potential of our system in combating online hate but also sets a new benchmark in multimodal content moderation.
License type:
Attribution 4.0 International (CC BY 4.0)
Funding Info:
This research / project is supported by the ID HTPO - NA
Grant Reference no. : C211418007
Description:
ISSN:

Files uploaded:

File Size Format Action
defactify-de-hate-3.pdf 330.57 KB PDF Open