Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models

Page view(s)
28
Checked on Mar 22, 2025
Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models
Title:
Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models
Journal Title:
ICLR2024
DOI:
Publication URL:
Publication Date:
31 May 2024
Citation:
nil
Abstract:
Recent advancements in large vision-language models (LVLMs) have demon- strated their impressive capability in visual information understanding with human language. Despite these advances, LVLMs still face challenges with multimodal hallucination, such as generating objects in the descriptions that are not present in the visual information. However, the underlying fundamental reasons of mul- timodal hallucinations remain poorly explored. In this paper, we propose a new perspective, suggesting that the inherent biases in LVLMs might be a key factor in hallucinations. Specifically, we systematically identify a semantic shift bias related to paragraph breaks (‘\n\n’), where the content before and after ‘\n\n’ in the training data frequently behaves significant semantic changes. This pattern leads the model to infer that the contents following ‘\n\n’ should be obviously different from the preceding contents with less hallucinatory descriptions, thereby increasing the probability of hallucinatory descriptions subsequent to the ‘\n\n’. We have validated this hypothesis on multiple publicly available LVLMs. Besides, we find deliberately inserting ‘\n\n’ at the generated description can induce more hallucinations. A simple method is proposed to effectively mitigate the hallucina- tion of LVLMs by avoiding the output of ‘\n’.
License type:
Publisher Copyright
Funding Info:
There was no specific funding for the research done
Description:
ISSN:
nil
Files uploaded: