Self-Consuming Generative Models with Adversarially Curated Data

Published 14 May 2025 in cs.LG | (2505.09768v1)

Abstract: Recent advances in generative models have made it increasingly difficult to distinguish real data from model-generated synthetic data. Using synthetic data for successive training of future model generations creates "self-consuming loops", which may lead to model collapse or training instability. Furthermore, synthetic data is often subject to human feedback and curated by users based on their preferences. Ferbach et al. (2024) recently showed that when data is curated according to user preferences, the self-consuming retraining loop drives the model to converge toward a distribution that optimizes those preferences. However, in practice, data curation is often noisy or adversarially manipulated. For example, competing platforms may recruit malicious users to adversarially curate data and disrupt rival models. In this paper, we study how generative models evolve under self-consuming retraining loops with noisy and adversarially curated data. We theoretically analyze the impact of such noisy data curation on generative models and identify conditions for the robustness of the retraining process. Building on this analysis, we design attack algorithms for competitive adversarial scenarios, where a platform with a limited budget employs malicious users to misalign a rival's model from actual user preferences. Experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed algorithms.

Abstract PDF Upgrade to Chat

Summary

The paper explores how adversarial data curation can disrupt self-consuming generative models, analyzing vulnerabilities and proposing competitive attack algorithms.
Theoretical analysis identifies conditions affecting robustness to noisy or adversarial curation, leading to the design of algorithms that strategically flip preference labels to misalign competitor models.
Experimental validation demonstrates the effectiveness of adversarial curation in misaligning model outputs from preferred distributions and shows that adding real data alone does not fully defend against preference attacks.

Exploring Adversarial Data Curation in Self-Consuming Generative Models

This paper explores the implications of self-consuming training loops within generative models, specifically focusing on the impact of adversarially curated data manipulation by competing platforms. The authors present theoretical findings alongside empirical results, highlighting vulnerabilities and proposing new attack algorithms.

Overview of Self-Consuming Training Loops

Generative models are increasingly used to produce synthetic data, which is often indistinguishable from real-world data. As synthetic data proliferates, there is a tendency for models to enter "self-consuming loops," where generated data is used as input for successive training iterations. This paper explores the negative ramifications of such loops, suggesting they can lead to model collapse, training instability, or biased outputs. It details how curated data driven by user preferences in these loops can converge toward distributions maximizing specific rewards, as shown by prior research. The novelty lies in exploring scenarios where data curation is noisy or adversarially manipulated, potentially by competitors aiming to disrupt preferred model outputs.

Theoretical Analysis and Attack Algorithms

The authors provide a theoretical analysis of how self-consuming generative models evolve when subject to noisy and adversarial data curation. Key conditions necessary for the robustness of the retraining process are identified, with a notable focus on correlation measures between genuine and adversarial reward functions. Under certain conditions, notably positive correlations between adversarial and genuine rewards, models may still converge to outputs aligned with user preferences, while negative correlations present vulnerabilities.

Based on this analysis, the paper introduces attack algorithms designed for competitive adversarial scenarios. These algorithms strategically target preference label flipping within datasets, guiding malicious users to misalign competitor models away from true user preferences. Techniques involve leveraging parametric reward models learned from datasets, optimizing perturbations to preference labels to maximize disruption. The algorithms come in gradients-based and heuristic forms, assessing trade-offs in computational complexity and effectiveness.

Experimental Validation

Empirical validation of the proposed attack methods is provided using synthetic and real-world datasets, including CIFAR-10 and CIFAR-100. Findings illustrate the effectiveness of adversarial curation in significantly misaligning model outputs from user-preferred distributions, confirming theoretical predictions. The study further examines whether integrating real data into training loops serves as a defense, finding that adding real data can align models with data distributions but does not adequately counteract adversarial alterations favoring specific user preferences.

Implications and Future Directions

The paper's findings emphasize the vulnerability of self-consuming models to adversarial data curation, underscoring the necessity for improved defenses in competitive environments. These results prompt considerations for developing more robust generative model training processes and defense mechanisms capable of maintaining alignment with genuine user preferences amidst adversarial interventions.

The implications are significant for the development and deployment of generative models in areas where synthetic data plays a vital role in training next-generation models, including automating creative processes or evolving personalized user experiences. Further research could focus on more advanced defense strategies against such adversarial attacks, examining outlier detection or more sophisticated data blending methods that safeguard against preference misalignment while allowing for user diversity.

In conclusion, this paper contributes valuable insight into the dynamics of self-consuming training loops in generative models, highlighting potential vulnerabilities to adversarial manipulation and laying the groundwork for ongoing exploration into more resilient AI deployments.

Markdown Report Issue