LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders

Published 24 May 2025 in cs.LG, cs.AI, cs.CV, and math.OC | (2505.18884v1)

Abstract: Visual encoders have become fundamental components in modern computer vision pipelines. However, ensuring robustness against adversarial perturbations remains a critical challenge. Recent efforts have explored both supervised and unsupervised adversarial fine-tuning strategies. We identify two key limitations in these approaches: (i) they often suffer from instability, especially during the early stages of fine-tuning, resulting in suboptimal convergence and degraded performance on clean data, and (ii) they exhibit a suboptimal trade-off between robustness and clean data accuracy, hindering the simultaneous optimization of both objectives. To overcome these challenges, we propose Lagrangian-Optimized Robust Embeddings (LORE), a novel unsupervised adversarial fine-tuning framework. LORE utilizes constrained optimization, which offers a principled approach to balancing competing goals, such as improving robustness while preserving nominal performance. By enforcing embedding-space proximity constraints, LORE effectively maintains clean data performance throughout adversarial fine-tuning. Extensive experiments show that LORE significantly improves zero-shot adversarial robustness with minimal degradation in clean data accuracy. Furthermore, we demonstrate the effectiveness of the adversarially fine-tuned CLIP image encoder in out-of-distribution generalization and enhancing the interpretability of image embeddings.

Abstract PDF Upgrade to Chat

Summary

Overview of LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders

The paper "LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders" presents a novel framework that addresses fundamental challenges in enhancing the robustness of visual encoders against adversarial perturbations. Visual encoders, such as CLIP, have revolutionized downstream tasks in computer vision, including classification and object detection. However, their vulnerability to adversarial attacks and backdoor threats necessitates robust methods to uphold their reliability and versatility.

Key Contributions

Identification of Critical Challenges: The authors highlight two major limitations in existing adversarial fine-tuning strategies—instability during early training leading to suboptimal convergence, and a suboptimal trade-off between robustness and clean data accuracy.
Proposed Solution: LORE: The paper introduces LORE, an unsupervised adversarial fine-tuning framework utilizing constrained optimization. This method effectively balances the competing objectives of robustness and nominal performance by enforcing embedding-space proximity constraints, ensuring minimal degradation in accuracy on clean data.
Implementation Details: LORE employs the Lagrangian dual method to enforce proximity to a reference model in the embedding space, maintaining semantic fidelity during adversarial fine-tuning. It uses a dual network designed to adaptively weigh constraints, avoiding sharp degradation often seen in naive approaches. The encoder is optimized through alternating primal and dual updates, fostering a dynamic balance between robustness and accuracy.

Experimental Results

LORE exhibits significant improvements in zero-shot adversarial robustness with minimal loss in clean accuracy across diverse architectures and settings. For instance, on zero-shot image classification benchmarks, LORE consistently outperforms FARE, the unconstrained counterpart, particularly at higher adversarial intensities. Additionally, LORE enhances out-of-distribution performance on datasets like ImageNet-C, demonstrating resilience against common corruptions. Importantly, LORE also improves cross-modal alignment in vision-LLMs like CLIP, as indicated by increased cosine similarity between clean image embeddings and text templates.

Theoretical Insights

The paper includes an in-depth analysis of the trade-offs inherent in unsupervised adversarial fine-tuning. It proposes constraining the hypothesis space $\mathcal{H}_\rho$ via proximity constraints, thus controlling the robustness-accuracy trade-off. The authors derive theoretical bounds on suboptimality that showcase LORE's advantage in maintaining robust performance without sacrificing nominal accuracy.

Implications and Future Directions

LORE's robust embeddings enhance the reliability of visual encoders in critical applications, fostering AI trustworthiness. The framework presents a pathway to principled adversarial robustness without relying on heuristic loss balancing. Future research could extend LORE to supervised settings, explore alternative parametrizations of the dual network, or investigate other constrained optimization techniques for stronger guarantees.

In conclusion, LORE offers a promising advancement in adversarial fine-tuning by effectively managing the trade-off between robustness and nominal performance, paving the way for more resilient AI systems in complex environments.