Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection

Published 3 Jul 2025 in cs.CV, cs.CL, and cs.CR | (2507.02844v2)

Abstract: With the emergence of strong vision language capabilities, multimodal LLMs (MLLMs) have demonstrated tremendous potential for real-world applications. However, the security vulnerabilities exhibited by the visual modality pose significant challenges to deploying such models in open-world environments. Recent studies have successfully induced harmful responses from target MLLMs by encoding harmful textual semantics directly into visual inputs. However, in these approaches, the visual modality primarily serves as a trigger for unsafe behavior, often exhibiting semantic ambiguity and lacking grounding in realistic scenarios. In this work, we define a novel setting: vision-centric jailbreak, where visual information serves as a necessary component in constructing a complete and realistic jailbreak context. Building on this setting, we propose the VisCo (Visual Contextual) Attack. VisCo fabricates contextual dialogue using four distinct vision-focused strategies, dynamically generating auxiliary images when necessary to construct a vision-centric jailbreak scenario. To maximize attack effectiveness, it incorporates automatic toxicity obfuscation and semantic refinement to produce a final attack prompt that reliably triggers harmful responses from the target black-box MLLMs. Specifically, VisCo achieves a toxicity score of 4.78 and an Attack Success Rate (ASR) of 85% on MM-SafetyBench against GPT-4o, significantly outperforming the baseline, which achieves a toxicity score of 2.48 and an ASR of 22.2%. Code: https://github.com/Dtc7w3PQ/Visco-Attack.

Abstract PDF Upgrade to Chat

Summary

The paper presents a novel visual-centric jailbreak method that bypasses MLLM safety checks using malicious image inputs.
It employs a dual-stage approach with context fabrication and prompt refinement to craft realistic adversarial dialogue.
Experimental evaluations on MM-SafetyBench reveal significant improvements in toxicity scores and attack success rates.

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection

The paper "Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection" explores the security vulnerabilities of multimodal LLMs (MLLMs) by integrating malicious visual inputs to induce harmful behavior. This work presents a novel 'visual-centric jailbreak' setting where visual information plays a critical role in constructing realistic adversarial contexts to exploit MLLMs.

Introduction to Visual-Centric Jailbreak

The concept of a visual-centric jailbreak is introduced, where visual inputs become essential in defining a complete jailbreak scenario. Prior methods often use visual triggers as semantic complements, lacking depth in realistic context construction. The paper proposes the VisCo Attack, which strategically employs visual-centric context to craft a multi-turn adversarial dialogue history, ultimately baiting the MLLM into generating harmful responses.

Figure 1: Illustration of the vision-centric jailbreak setting. The visual input is an essential component that constitutes the complete jailbreak scenario.

VisCo Attack Methodology

Attack Strategy

VisCo Attack operates via two primary stages: context fabrication and attack prompt refinement. In the fabrication stage, enhanced visual details are utilized to generate a deceptive conversational history using one of four predefined strategies. These strategies include simulating scenarios, analyzing from multiple perspectives, interrogating iteratively, and exploiting visual hallucinations.

Figure 2: Workflow of the VisCo Attack. (Left) generation of fabricated visual context and the initial attack prompt using vision-focused strategies. (Right) iterative toxicity obfuscation and semantic refinement.

Context Fabrication

Focused on creating an environment where malicious dialogue is embedded naturally, each strategy generates an initial attack prompt interwoven with fabricated visual context. This approach closely mimics natural interactions between user and model, encouraging the model to bypass safety checks.

In the refinement stage, the initial attack prompt is iteratively optimized to align with harmful intent semantically and evade safety mechanisms. The iterative process ensures high semantic relevance to the original malicious query without explicit unsafe language. This refined prompt effectively triggers the desired harmful response.

Experimental Evaluation

The VisCo Attack was evaluated using benchmarks including MM-SafetyBench, demonstrating significant improvements in toxicity scores and attack success rates compared to existing approaches. The experimental setup incorporated various models like GPT-4o and Gemini-2.0, highlighting VisCo's versatile applicability across different MLLMs.

Figure 3: Results of VisCo Attack with different red team assistants ($\pi_{\text{red}$), showcasing its effectiveness across models.

Case Studies and Visualization

Several case studies reinforce the utility of VisCo's image-driven attack methodology. Real world scenarios simulated in experiments successfully elicited unsafe outputs, verifying the strategic combination of visual inputs with textual prompts.

Figure 4: Case Study (VS: Image-Grounded Scenario Simulation)

Figure 5: Case Study (VM: Image Multi-Perspective Analysis)

Conclusion

The VisCo Attack exemplifies a sophisticated method of leveraging visual inputs for jailbreaking MLLMs, addressing existing limitations in multimodal safety frameworks. By embedding visual detail into adversarial contexts, VisCo enhances the efficacy of jailbreak attempts while preserving realistic interaction semantics. These findings prompt reconsideration of MLLM safety strategies in open-world applications, guiding future development of countermeasures against multimodal adversarial exploits.

Limitations and Ethical Considerations

Despite its effectiveness, VisCo relies on manual strategy templates, presenting challenges in adaptability and scalability. Future research may explore automatic generation techniques to further enhance generalizability. Ethical considerations underscore the intention for academic exploration, urging stringent evaluations before deploying MLLMs publicly.

By presenting a robust methodology for exploiting multimodal vulnerabilities, the VisCo Attack calls attention to the importance of fortified safety protocols in protecting against complex adversarial scenarios. The full potential of VisCo lies not only in its immediate application but also in guiding the evolution of future defensive and evaluative practices in AI systems.