- The paper presents a novel visual-centric jailbreak method that bypasses MLLM safety checks using malicious image inputs.
- It employs a dual-stage approach with context fabrication and prompt refinement to craft realistic adversarial dialogue.
- Experimental evaluations on MM-SafetyBench reveal significant improvements in toxicity scores and attack success rates.
Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection
The paper "Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection" explores the security vulnerabilities of multimodal LLMs (MLLMs) by integrating malicious visual inputs to induce harmful behavior. This work presents a novel 'visual-centric jailbreak' setting where visual information plays a critical role in constructing realistic adversarial contexts to exploit MLLMs.
Introduction to Visual-Centric Jailbreak
The concept of a visual-centric jailbreak is introduced, where visual inputs become essential in defining a complete jailbreak scenario. Prior methods often use visual triggers as semantic complements, lacking depth in realistic context construction. The paper proposes the VisCo Attack, which strategically employs visual-centric context to craft a multi-turn adversarial dialogue history, ultimately baiting the MLLM into generating harmful responses.
Figure 1: Illustration of the vision-centric jailbreak setting. The visual input is an essential component that constitutes the complete jailbreak scenario.
VisCo Attack Methodology
Attack Strategy
VisCo Attack operates via two primary stages: context fabrication and attack prompt refinement. In the fabrication stage, enhanced visual details are utilized to generate a deceptive conversational history using one of four predefined strategies. These strategies include simulating scenarios, analyzing from multiple perspectives, interrogating iteratively, and exploiting visual hallucinations.
Figure 2: Workflow of the VisCo Attack. (Left) generation of fabricated visual context and the initial attack prompt using vision-focused strategies. (Right) iterative toxicity obfuscation and semantic refinement.
Context Fabrication
Focused on creating an environment where malicious dialogue is embedded naturally, each strategy generates an initial attack prompt interwoven with fabricated visual context. This approach closely mimics natural interactions between user and model, encouraging the model to bypass safety checks.
Prompt Refinement
In the refinement stage, the initial attack prompt is iteratively optimized to align with harmful intent semantically and evade safety mechanisms. The iterative process ensures high semantic relevance to the original malicious query without explicit unsafe language. This refined prompt effectively triggers the desired harmful response.
Experimental Evaluation
The VisCo Attack was evaluated using benchmarks including MM-SafetyBench, demonstrating significant improvements in toxicity scores and attack success rates compared to existing approaches. The experimental setup incorporated various models like GPT-4o and Gemini-2.0, highlighting VisCo's versatile applicability across different MLLMs.
Figure 3: Results of VisCo Attack with different red team assistants ($\pi_{\text{red}$), showcasing its effectiveness across models.
Case Studies and Visualization
Several case studies reinforce the utility of VisCo's image-driven attack methodology. Real world scenarios simulated in experiments successfully elicited unsafe outputs, verifying the strategic combination of visual inputs with textual prompts.

Figure 4: Case Study (VS: Image-Grounded Scenario Simulation)
Figure 5: Case Study (VM: Image Multi-Perspective Analysis)
Conclusion
The VisCo Attack exemplifies a sophisticated method of leveraging visual inputs for jailbreaking MLLMs, addressing existing limitations in multimodal safety frameworks. By embedding visual detail into adversarial contexts, VisCo enhances the efficacy of jailbreak attempts while preserving realistic interaction semantics. These findings prompt reconsideration of MLLM safety strategies in open-world applications, guiding future development of countermeasures against multimodal adversarial exploits.
Limitations and Ethical Considerations
Despite its effectiveness, VisCo relies on manual strategy templates, presenting challenges in adaptability and scalability. Future research may explore automatic generation techniques to further enhance generalizability. Ethical considerations underscore the intention for academic exploration, urging stringent evaluations before deploying MLLMs publicly.
By presenting a robust methodology for exploiting multimodal vulnerabilities, the VisCo Attack calls attention to the importance of fortified safety protocols in protecting against complex adversarial scenarios. The full potential of VisCo lies not only in its immediate application but also in guiding the evolution of future defensive and evaluative practices in AI systems.