Zero-Cost Proxies for Efficient NAS
- Zero-cost proxies are analytical metrics that predict neural network performance without any gradient-based training using a single computation pass.
- They leverage gradient, activation, and structural signals to efficiently rank architectures in Neural Architecture Search pipelines.
- Ensemble and automated proxy methods drastically reduce computational costs while maintaining strong correlations with final test accuracy.
Zero-cost proxies are analytical or computational metrics that estimate the performance of candidate neural network architectures without any gradient-based training or fine-tuning of model parameters. Within Neural Architecture Search (NAS), zero-cost proxies serve as highly efficient surrogates for model evaluation, replacing expensive full or partial training with cheap, architecture-level signals derived from initial network statistics or topology. These proxies are now central to training-free and zero-shot NAS pipelines, as they enable rapid ranking and selection among potentially vast search spaces, while requiring only a single forward or backward pass—or, in some approaches, purely combinatorial analysis of network graphs. Recent research has resulted in a diverse landscape of zero-cost proxies, spanning from expert-designed gradient or activation-based metrics to comprehensive data-driven and auto-discovered ensemble frameworks.
1. Core Concepts and Methodologies
Zero-cost proxies—also known as zero-shot or training-free proxies—are scalar functions mapping a randomly-initialized, untrained architecture (or its DAG representation) to . Their goal is to rank architectures according to their expected generalization performance on a given task. The dominant methodology is to compute per-layer or per-block statistics using a single mini-batch (typically real or synthetic data, or even data-free all-ones input), aggregate these statistics, and use the result as a ranking signal (Abdelfattah et al., 2021, Krishnakumar et al., 2022).
A canonical zero-cost NAS routine thus comprises:
- Generation or sampling of candidate architectures from the search space;
- Per-candidate computation of one or several proxy scores in parallel;
- Ranking and selection of promising candidates for full training or further evaluation.
Proxies can be categorized into:
- Gradient-based proxies: , SNIP, SynFlow, GraSP, Fisher, GradNorm, NWOT, Jacobian covariance (jacov), Hessian-based.
- Activation and expressivity proxies: Number of linear regions, NTK spectrum, Zen-NAS, NASWOT, epe-nas.
- Structural/topological proxies: FLOPs, number of parameters, operation counts, GRAF, SED.
- Composite and learned proxies: Ensemble models, genetic programming–discovered proxy formulas (EZNAS, GreenMachine), parametric models (ParZC), transformer-GCN models (TG-NAS).
2. Mathematical Foundations and Principal Formulas
Most zero-cost proxies reduce to a per-architecture aggregation of measurements computable at initialization. Key representatives are as follows:
| Proxy | Formula/Computation (LaTeX syntax) | Intuition |
|---|---|---|
| GradNorm | Trainability | |
| SNIP | Sensitivity to pruning | |
| SynFlow | Gradient flow in data-free mode | |
| Zen-NAS | Expressivity under perturbations | |
| Jacov | Class-discriminative Jacobians | |
| Params | Capacity and expressivity | |
| FLOPs | Capacity and compute, baseline |
Composite and learned proxies, such as those found by EZNAS (Akhauri et al., 2022), LPZero (Dong et al., 2024), and GreenMachine (Cortês et al., 2024), combine multiple such statistics with parametric or non-parametric operators to maximize empirical rank correlation with ground-truth test accuracy.
3. Evaluation Protocols and Benchmarks
Evaluation of proxy effectiveness is primarily conducted in terms of correlation between proxy-induced rankings and final test accuracy on large NAS benchmarks, including NAS-Bench-101, NAS-Bench-201, NATS-Bench SSS/TSS, and DARTS macro-cell spaces. Protocols include:
- Rank-correlation metrics: Kendall's , Spearman's , or, for regression models, on predicted accuracy (Krishnakumar et al., 2022, Li et al., 2023, Lukasik et al., 2023).
- Benchmarks: Uniform or stratified sampling across the full architecture space to distinguish both high- and low-performing models.
- Downstream impact: Proxy-guided NAS loops (random search, evolution, RL, surrogate-based) are measured on the number of full-training runs to reach a fixed performance threshold and on wall-clock total search time (Abdelfattah et al., 2021, Shen et al., 2021, Qiao et al., 2024).
A table of typical proxy performance (Kendall's on NAS-Bench-201/CIFAR-10) is below:
| Proxy | |
|---|---|
| Params | 0.55 |
| SynFlow | 0.54 |
| SNIP | 0.41 |
| Zen-NAS | 0.29 |
| TG-NAS | 0.57 |
| GreenMachine | 0.89* |
| GreenFactory | 0.90+ |
*Indicates stratified sampling or ensemble; see (Cortês et al., 2024, Cortês et al., 14 May 2025) for details.
4. Ensemble, Automated, and Learned Proxies
Limitations of individual proxies—generalization gaps across search spaces, spurious correlations with trivial statistics (e.g. params, depth), and reduced discrimination among top-10% architectures—have led to new automated and ensemble approaches:
- Genetic Programming (GP) proxies: EZNAS, GreenMachine, and LPZero search a symbolic program space, evolving novel zero-cost metrics that outperform hand-designed proxies and show superior transferability (Akhauri et al., 2022, Cortês et al., 2024, Dong et al., 2024).
- Parametric models: ParZC utilizes a Mixer Architecture with Bayesian layers, learning to aggregate node-level proxies with uncertainty estimation and explicitly optimizing a differentiable Kendall's Tau loss (Dong et al., 2024).
- Graph/Topology-only proxies: SED (Wu et al., 2024), GRAF (Kadlecová et al., 2024), and operator-level graph convolutional models (TG-NAS (Qiao et al., 2024)) extract purely combinatorial or language-embedded features for performance estimation.
- Multi-proxy ensembles: GreenFactory (Cortês et al., 14 May 2025) ensembles over 21 zero-cost proxies using a random forest regressor, achieving state-of-the-art correlations on NATS-Bench SSS/TSS.
Empirical evidence confirms that ensembles leveraging complementary proxies frequently exceed the performance of any single measure and mitigate biases due to over-reliance on architecture size or skip-connections (Krishnakumar et al., 2022, Cortês et al., 14 May 2025).
5. Integration in NAS Workflows and Empirical Acceleration
Zero-cost proxies are now tightly integrated into the inner and outer loops of NAS algorithms. Common modes of use include:
- Warmup: Score large candidate pools by proxy, restrict expensive training to the top- architectures (Abdelfattah et al., 2021).
- Move proposal: Use proxies at each selection or mutation step in evolutionary or RL-based NAS, rapidly generating high-quality offspring (Abdelfattah et al., 2021).
- Surrogate modeling: Proxy scores serve as inputs to fast regression or ranking surrogates (e.g., XGBoost, random forests) that predict performance or guide sampling (Krishnakumar et al., 2022, Cortês et al., 2024, Cortês et al., 14 May 2025).
- One-shot NAS augmentation: Ranking distillation from zero-cost proxies improves ranking quality in weight-sharing supernets (Dong et al., 2023).
- Differentiable NAS operation selection: Zero-cost operation scoring (Zero-Cost-PT) outperforms classic DARTS -based selection for discrete operations in one-shot NAS, enabling scalable, high-fidelity NAS in macro and cell-based spaces (Xiang et al., 2021).
Speedups are substantive: ProxyBO achieves up to fewer full-training steps than regularized evolution to hit identical test error on NAS-Bench-101; AZ-NAS and GreenFactory achieve effective search using 1\% of the compute of training-based methods on NATS-Bench; SED delivers a runtime reduction relative to gradient-based proxies (Shen et al., 2021, Lee et al., 2024, Wu et al., 2024, Cortês et al., 14 May 2025).
6. Limitations, Pitfalls, and Design Best Practices
While zero-cost proxies are indispensable in training-free NAS, several limitations persist:
- Generalization gaps: Many proxies, especially size-based or gradient proxies, exhibit strong bias towards models with more parameters or deeper layers, failing in search spaces with non-monotonic parameter–performance relationships (Krishnakumar et al., 2022, Kadlecová et al., 2024).
- Discrimination among top-tiers: Correlations typically degrade when restricted to the top-\% of highest-accuracy models (Li et al., 2023).
- Robustness prediction: While clean accuracy can often be regressed from one or two proxies, prediction of adversarial robustness requires aggregation of multiple (often weakly correlated) measures (Lukasik et al., 2023).
- Domain transfer: Operator-specific or search-space–tuned proxies (e.g., ViT, CNN, NLP) degrade outside their domain. Universal proxies such as TG-NAS, Auto-Prox, and LPZero aim to address this but remain an active area (Qiao et al., 2024, Wei et al., 2023, Dong et al., 2024).
- Hyperparameter tuning and proxy selection: Performance and runtime are sensitive to proxy hyperparameters (e.g., mini-batch size, initializations), smoothing temperatures, and aggregation functions (Shen et al., 2021, Dong et al., 2024).
Best practices—substantiated by large-scale empirical studies—include:
- Ensemble several non-redundant proxies and use regression or ranking models to aggregate their signal (Krishnakumar et al., 2022, Cortês et al., 14 May 2025).
- Pre-assess and bias-correct proxies with respect to trivial architecture features (cell size, op count) (Krishnakumar et al., 2022, Kadlecová et al., 2024).
- Evaluate proxies in stratified accuracy bins, not only on randomly drawn architectures, to ensure signal across the full spectrum (Cortês et al., 2024, Cortês et al., 14 May 2025).
- Introduce task and hardware-features where applicable for hardware-aware NAS (Li et al., 2023, Qiao et al., 2024).
7. Advances, Open Challenges, and Future Directions
Recent technical progress includes:
- Automated design of universal and transfer-friendly symbolic proxies for non-CNN architectures (ViT, LLM; e.g., Auto-Prox, LPZero) (Wei et al., 2023, Dong et al., 2024).
- Genetic programming and differentiable proxy learning (GreenMachine, ParZC) for task-adaptive and uncertainty-aware zero-cost estimation (Cortês et al., 2024, Dong et al., 2024).
- Purely topological, data- and parameter-free proxies for ultrafast scoring (SED, GRAF) and interpretability (Wu et al., 2024, Kadlecová et al., 2024).
- Combined multi-objective zero-cost NAS, e.g., balancing accuracy, robustness, and hardware constraints (Lukasik et al., 2023, Qiao et al., 2024, Li et al., 2023).
Open challenges remain in designing proxies that generalize across architecture families, modalities (vision, NLP, speech), and resource regimes, as well as in robustly fusing proxy sets without substantial human tuning. The theoretical underpinnings—e.g., why certain proxies correlate with accuracy in deep, structured networks—are only partially elucidated. Ongoing research is expanding zero-cost proxy theory to integrate hardware-awareness, multi-objective tradeoffs, and more principled connections to optimization and generalization.
In summary, zero-cost proxies are fundamental tools for scaling and democratizing NAS, offering several orders of magnitude acceleration over classical model evaluation and expanding the feasible scope of automated architecture search (Abdelfattah et al., 2021, Shen et al., 2021, Akhauri et al., 2022, Krishnakumar et al., 2022, Qiao et al., 2024, Wu et al., 2024, Cortês et al., 14 May 2025).