Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fast Compound Scaling in Neural Networks

Updated 22 February 2026
  • Fast compound scaling is a method for scaling neural networks by emphasizing width to achieve sublinear activation growth, optimizing resource use while preserving accuracy.
  • It reallocates computing resources in CNNs to primarily increase model width, resulting in activation growth near O(√s) and improved runtime and accuracy compared to standard scaling.
  • For ensemble inference, it determines the optimal number of model calls using sample-efficient techniques to balance accuracy gains against economic cost.

Fast compound scaling encompasses algorithmic regimes and design methodologies that seek to scale neural architectures, or compound inference systems, with minimal resource overhead while maintaining or improving empirical performance. This concept encompasses scaling laws for convolutional neural networks (CNNs) where model width is emphasized to suppress activation growth, as well as compound ensemble methods for LLMs, where the number of calls is rapidly optimized relative to accuracy and economic cost.

1. Overview and Definition

Fast compound scaling in deep learning refers to procedures for increasing model or system capacity—such as model width, depth, spatial resolution, or ensemble call count—such that the resulting accuracy, latency, and resource scaling are jointly optimized. The principal aim is to decouple or sublinearly relate resource increases (memory footprint, wall-clock latency, compute) to increments in predictive performance, particularly in contexts constrained by hardware or budget (Dollár et al., 2021, &&&1&&&).

In CNN scaling, fast compound scaling specifies the allocation of available floating-point operation (FLOP) budget primarily into model width, with lesser emphasis on depth and input resolution, to ensure activation (and thus memory) growth scales closer to O(s)O(\sqrt{s}) rather than O(s)O(s) with respect to an upscaling factor ss (Dollár et al., 2021). In compound inference with LLMs, fast compound scaling denotes the ability to derive, from a few pilot runs, the optimal ensemble size (e.g., number of majority votes) to maximize accuracy per unit cost (Chen et al., 2024).

2. Compound Scaling Formats in Neural Networks

Let w0,d0,r0w_0, d_0, r_0 denote base network width, depth, and input resolution. For a desired upscaling factor ss, the canonical compound scaling family is characterized by exponents (α,β,γ)(\alpha, \beta, \gamma) solving β+2α+2γ=1\beta + 2\alpha + 2\gamma = 1, applied as

w=w0sα;d=d0sβ;r=r0sγ,w = w_0 \cdot s^\alpha;\qquad d = d_0 \cdot s^\beta;\qquad r = r_0 \cdot s^\gamma,

ensuring total FLOPs scale by ss (since FLOPs dw2r2\propto d\cdot w^2\cdot r^2) (Dollár et al., 2021).

Fast compound scaling defines a one-parameter regime: ed=1α2,ew=α,er=1α2e_d = \tfrac{1-\alpha}{2}, \quad e_w = \alpha, \quad e_r = \tfrac{1-\alpha}{2} such that

d=d0sed,w=w0sew/2,r=r0ser/2d = d_0 \cdot s^{e_d}, \quad w = w_0 \cdot s^{e_w/2}, \quad r = r_0 \cdot s^{e_r/2}

and

a(s)=a0s2α2a(s) = a_0 \cdot s^{\frac{2-\alpha}{2}}

where a(s)a(s) denotes total activations. When α0.8\alpha \approx 0.8, activation growth is empirically s0.6\approx s^{0.6}—significantly sublinear and contrasting with the nearly linear growth (s0.83\approx s^{0.83}) typical of “standard” compound scaling (Dollár et al., 2021).

Pseudocode for Fast Compound Scaling

1
2
3
4
5
6
7
e_d = (1 - alpha) / 2
e_w = alpha
e_r = (1 - alpha) / 2

d_new = round(d0 * s**e_d)
w_new = round_width(w0 * (s**0.5)**e_w)
r_new = round_resolution(r0 * (s**0.5)**e_r)
round_width and round_resolution apply divisibility constraints for efficient hardware execution (Dollár et al., 2021).

3. Theoretical Rationale and Complexity Analysis

The underlying insight of fast compound scaling is that width-dominant scaling induces only O(s)O(\sqrt{s}) growth in activations, as opposed to O(s)O(s) for more evenly balanced scaling policies. For instance, when scaling only width (α=1,β=0,γ=0\alpha = 1, \beta = 0, \gamma = 0), the resulting activation count grows as a(s)=a0s0.5a(s) = a_0\, s^{0.5}. In contrast, standard compound scaling (e.g., EfficientNet’s regime, α=1/6,β=1/3,γ=1/6\alpha=1/6, \beta=1/3, \gamma=1/6) yields activation scaling exponent $5/6$, i.e., nearly linear (Dollár et al., 2021, Tan et al., 2019).

This sublinear growth is directly beneficial for hardware where memory traffic or on-chip activation footprint is a primary bottleneck, as empirically sustained by a tight correlation (ρ0.99\rho \simeq 0.99) between activations and runtime for major CNN architectures on GPU/TPU (Dollár et al., 2021).

4. Empirical Performance and Trade-Offs

Empirical benchmarks on EfficientNet-B0 and RegNet variants demonstrate that fast compound scaling (α=0.8\alpha = 0.8) achieves ImageNet accuracy within a few tenths of a percent of the best accuracy at fixed FLOPs, while enabling up to 2×2\times reduction in epoch runtime relative to classical compound scaling strategies (Dollár et al., 2021).

Model + Scaling FLOPs Params (M) Activations (M) Time (min) Top-1 Error (%)
Eff.-B0, width 4.0 B 36 29 10.8 19.9
Eff.-B0, standard 4.1 B 27 49 19.4 18.4
Eff.-B0, fast 4.1 B 36 29 11.1 17.7

Across all evaluated scales, fast scaling matches or exceeds conventional strategies in runtime while nearly matching top-1 accuracy, affirming its suitability under memory-bandwidth constraints.

5. Fast Compound Scaling in Compound Inference Systems

For compound ensemble systems such as majority-vote LLM querying, fast compound scaling refers to sample-efficient determination of the optimal number of system calls nn^* that maximizes aggregate accuracy for a given task mixture (Chen et al., 2024). The model assumes queries divided between “easy” (p1>0.5p_1 > 0.5) and “hard” (p2<0.5p_2 < 0.5), with mixture parameter α\alpha.

Given binomial majority voting, the closed-form for optimal call count nn^* (rounded to an odd integer) is

n2ln(1αα12p22p11)ln(p1(1p1)p2(1p2))1n^* \approx 2\,\frac {\ln\left(\frac{1-\alpha}{\alpha}\frac{1-2p_2}{2p_1-1}\right)} {\ln\left(\frac{p_1(1-p_1)}{p_2(1-p_2)}\right)} - 1

making it possible to optimize cost-accuracy trade-off with only a handful of empirical samples. Non-monotonic or inverse-U accuracy behavior as a function of nn is analytically predicted and empirically observed when the mixture of easy and hard queries crosses a critical threshold (Chen et al., 2024).

6. Practical Application and Guidelines

Implementation of fast compound scaling in vision proceeds as follows: starting from a tuned base architecture, select compute upscaling factor ss; if memory- or latency-constrained, choose as large an α\alpha as accuracy permits, typically α0.8\alpha \approx 0.8 (Dollár et al., 2021). Compute new width, depth, and resolution via the provided formulae, retrain the scaled model, and validate both accuracy and runtime.

For ensemble inference systems, collect a small batch of queries, run micro-ensembles (n=15n = 1 \ldots 5), infer (α,p1,p2\alpha, p_1, p_2), and compute the optimal nn^*. This avoids brute-force ensemble sweeps and can halve cost compared to large, fixed-size ensembles with no loss in accuracy (Chen et al., 2024).

7. Connections to Hardware Constraints and Future Directions

Fast compound scaling is intimately linked to the memory-bandwidth ceilings of current GPU/TPU accelerators, as architectural designs with O(s)O(\sqrt{s}) activation scaling maintain throughput without incurring prohibitive memory or data transfer penalties (Dollár et al., 2021). As model architectures and compound inference systems become increasingly cost-aware and bandwidth-limited, fast compound scaling principles are likely to permeate neural scaling law construction, resource allocation heuristics, and automated architecture search procedures.


References:

  • “Fast and Accurate Model Scaling” (Dollár et al., 2021)
  • “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks” (Tan et al., 2019)
  • “Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems” (Chen et al., 2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fast Compound Scaling.