Does instruction-based unlearning extend beyond large language models to other generative models?

Determine whether instruction-based unlearning—modifying model behavior at inference time via natural-language instructions—extends from large language models to other generative models, including diffusion-based image generation systems.

Background

Instruction-based unlearning has been shown to effectively adjust the behavior of LLMs at inference time. Whether similar instruction-only control can be used to suppress or remove concepts in other types of generative models had not been established at the outset of this study.

This paper focuses on diffusion-based image generation and presents evidence that instruction-based unlearning fails in such models, highlighting the broader question of generalizability across generative model families.

References

Instruction-based unlearning has proven effective for modifying the behavior of LLMs at inference time, but whether this paradigm extends to other generative models remains unclear.

Why Instruction-Based Unlearning Fails in Diffusion Models?  (2604.01514 - zhang et al., 2 Apr 2026) in Abstract