Test-Time Conditioning with Representation-Aligned Visual Features

Published 3 Feb 2026 in cs.CV | (2602.03753v1)

Abstract: While representation alignment with self-supervised models has been shown to improve diffusion model training, its potential for enhancing inference-time conditioning remains largely unexplored. We introduce Representation-Aligned Guidance (REPA-G), a framework that leverages these aligned representations, with rich semantic properties, to enable test-time conditioning from features in generation. By optimizing a similarity objective (the potential) at inference, we steer the denoising process toward a conditioned representation extracted from a pre-trained feature extractor. Our method provides versatile control at multiple scales, ranging from fine-grained texture matching via single patches to broad semantic guidance using global image feature tokens. We further extend this to multi-concept composition, allowing for the faithful combination of distinct concepts. REPA-G operates entirely at inference time, offering a flexible and precise alternative to often ambiguous text prompts or coarse class labels. We theoretically justify how this guidance enables sampling from the potential-induced tilted distribution. Quantitative results on ImageNet and COCO demonstrate that our approach achieves high-quality, diverse generations. Code is available at https://github.com/valeoai/REPA-G.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper serves as a submission template, providing detailed formatting and anonymization guidelines without technical findings on test-time conditioning.
It outlines strict procedures, including page limits, camera-ready adjustments, and reproducibility standards for ICML 2026 submissions.
The document emphasizes accessibility and proper structuring to streamline the peer review process and ensure consistency across submissions.

Test-Time Conditioning with Representation-Aligned Visual Features: A Technical Appraisal

Overview

The document under consideration, titled "Test-Time Conditioning with Representation-Aligned Visual Features" (2602.03753), is not a research manuscript but rather a template and set of guidelines for submissions to ICML 2026. Although its title references advanced topics in test-time conditioning and representation alignment in visual features, the actual content is strictly dedicated to formatting, anonymization, and procedural aspects for prospective ICML authors. There is no substantive technical content, methodology, experimental results, or discussion regarding test-time conditioning or representation alignment.

Content Analysis

The main body of this document outlines:

Submission Procedures: Strict requirements for electronic PDF submissions, page limits (8 pages main body, 1 extra for the camera-ready version), and double-blind peer review protocols.
Formatting Specifications: Typographic and structural guidelines for typesetting using LaTeX, with rules on author listing, figure and table presentation, sectioning, and references.
Camera-Ready Processes: Handling of affiliations, acknowledgements, and changing certain template macros for papers accepted to the conference.
Accessibility and Impact: Encouragement of accessible writing for inclusivity and requirement for a "Broader Impact Statement" on potential societal implications.
Supplementary Materials: Guidance on the attachment of appendices and the inclusion of code and datasets for reproducibility.

The appendix continues this pattern, explaining non-technical details about optional sections (appendices) in submissions.

Absence of Technical Contribution

Despite the reference in the title to "Test-Time Conditioning," "Representation-Aligned Visual Features," and related state-of-the-art research concepts, the document does not contain any:

Problem formulation or motivation of test-time conditioning methods
Technical background on representation learning or visual feature alignment
Algorithmic innovations or architectures
Experimental protocols, benchmarks, or quantitative results
Theoretical analysis or proofs
Comparative discussion with prior work

There are no claims—bold, contradictory, or otherwise—pertaining to technical methods, empirical performance, or theoretical implications.

Implications and Future Directions

As the document is strictly a formatting and procedural template for ICML submissions, it makes no assertions, recommendations, or claims related to research methodology, model architectures, evaluation frameworks, or downstream applications in AI or computer vision. Consequently, it does not directly bear on theoretical advancements or practical developments in the field. Its only broader implication is in standardizing the format and evaluative process, thereby optimizing the peer review workflow and aiding reproducibility in the academic community.

Conclusion

"Test-Time Conditioning with Representation-Aligned Visual Features" (2602.03753) is presently a template and guideline resource for ICML 2026 and does not contain research findings on visual feature alignment, test-time adaptation, or related domains. No technical methods, numerical results, or claims are offered for scrutiny or discussion. Any evaluation of the research implications, experimental merits, or theoretical advances on the stated topic must await the appearance of a substantive manuscript actually related to the title's theme.

Markdown Report Issue