Defenses for Structure-Based MLLM Jailbreaks
Develop effective defense mechanisms for structure-based jailbreak attacks on multimodal large language models (MLLMs), where harmful content is embedded within images alongside crafted textual instructions to bypass safety alignment, and demonstrate that such defenses can reliably prevent unauthorized or harmful model outputs across relevant threat scenarios.
References
In contrast to the relatively minor perturbations characteristic of the first category, structure-based attacks pose a more profound challenge, as the development of effective defense mechanisms remains an open research problem [dress,adashield,mllmprotector,ECSO,jailguard].
— Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses
(2510.21214 - Zhong et al., 24 Oct 2025) in Section 2.2 (Jailbreak Attacks on MLLMs)