Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hourglass MLPs: High-Dimensional Residual Refinement

Updated 19 January 2026
  • Hourglass MLPs are neural network architectures that invert conventional residual block designs by employing wide high-dimensional skip connections and narrow bottleneck paths.
  • They leverage fixed random projections to efficiently lift input vectors, reducing trainable parameters while preserving geometric properties for robust performance.
  • Empirical results in generative, denoising, and image restoration tasks highlight their superior expressivity and parameter efficiency over conventional MLPs.

Hourglass MLPs are multi-layer perceptron architectures characterized by an inversion of the conventional block shape, employing a wide–narrow–wide structure. In these designs, residual (skip) connections operate in an expanded high-dimensional latent space, while the learnable computation proceeds through a sequence of narrow bottlenecks. This configuration facilitates highly expressive incremental refinement within a rich latent representation, while optimizing parameter economy and efficiency. Hourglass MLPs leverage fixed random projections into high-dimensional spaces, yielding further savings in trainable parameters and memory bandwidth. Empirical studies demonstrate consistent superiority of Hourglass architectures over conventional MLPs in generative, denoising, and image restoration tasks, with distinctly different scaling behaviors as parameter budgets increase (Chen et al., 2 Oct 2025).

1. Architectural Principles and Motivation

Conventional residual MLP blocks employ a narrow–wide–narrow schema:

  • Input/output dimension dxd_x corresponds to token or pixel-vector size.
  • Hidden expansion dh>dxd_h > d_x.
  • Block operation: xi+1=xi+W2σ(W1norm(xi))x_{i+1} = x_i + W_2\, \sigma(W_1\, \mathrm{norm}(x_i)), where W1Rdh×dxW_1 \in \mathbb{R}^{d_h \times d_x}, W2Rdx×dhW_2 \in \mathbb{R}^{d_x \times d_h}.
  • The skip connection operates at dxd_x, confining learnable residuals to the input/output space.

Hourglass MLP blocks reverse this configuration:

  • Use a high-dimensional latent space dzdxd_z \gg d_x for the skip connection.
  • Employ a narrow bottleneck dh<dzd_h < d_z for the computation pathway.
  • Structured as:

    1. Input lift: z0=Winx0z_0 = W_{\text{in}} x_0, WinRdz×dxW_{\text{in}} \in \mathbb{R}^{d_z \times d_x}.
    2. LL residual Hourglass blocks: zi+1=zi+Wi,2σ(Wi,1norm(zi))z_{i+1} = z_i + W_{i,2}\, \sigma(W_{i,1}\, \mathrm{norm}(z_i)), Wi,1Rdh×dzW_{i,1} \in \mathbb{R}^{d_h \times d_z}, Wi,2Rdz×dhW_{i,2} \in \mathbb{R}^{d_z \times d_h}.
    3. Final projection: y^=WoutzL\hat{y} = W_{\text{out}} z_L, WoutRdy×dzW_{\text{out}} \in \mathbb{R}^{d_y \times d_z}.

This design enables residual pathways to live in richer, high-dimensional feature spaces, potentially allowing for more expressive incremental corrections. The bottleneck restricts the cost of each block, facilitating greater model depth under a fixed parameter budget.

2. Fixed Random Projection Strategies

Hourglass MLPs frequently employ a fixed random projection WinW_\text{in} to lift input vectors into the expanded latent space. Theoretical foundations in reservoir computing, random-feature models, Johnson–Lindenstrauss, and compressive-sensing indicate that such projections preserve essential geometric and discriminative properties with high probability, provided dzdxd_z \gg d_x.

Key benefits include:

  • Elimination of trainable parameters for WinW_\text{in}.

  • Reduced memory and bandwidth overhead, as random matrices can be generated on-the-fly.
  • Comparable empirical performance: In ImageNet-32 denoising with (dz,dh,L)=(3546,270,5)(d_z, d_h, L) = (3546, 270, 5), models with fixed versus trainable WinW_\text{in} yield nearly identical PSNR curves (difference 0.1\ll 0.1 dB).

Across evaluated tasks, Hourglass MLPs with fixed projections consistently align with the Pareto frontier of their fully trainable counterparts.

3. Parameter Budget and Computational Complexity

Let d=dxd = d_x, ee be the expansion (dz=edd_z = e d), b=dhb = d_h the bottleneck width, LL the stack depth. The parameter count for an Hourglass MLP is:

  • Trainable: Phr(d,e,b,L)=ddzP_{\text{hr}}(d, e, b, L) = d d_z (input lift) +2L(dzb)+ 2L (d_z b) (per-block) =ed2+2Ledb= e d^2 + 2L e d b.
  • With fixed WinW_\text{in}: Pfix=2LedbP_{\text{fix}} = 2L e d b.

Contrast with conventional MLPs (expansion ff):

  • Pnr(d,f,L)=2LdfP_{\text{nr}}(d, f, L) = 2L d f.

To match parameter budgets, Hourglass architectures select e1e \gg 1, bfb \ll f, L1L \gg 1 such that ed2+2Ledb2Ldfe d^2 + 2L e d b \approx 2L d f.

Forward FLOPs per block:

  • Hourglass: 2dzdh=2edb2 d_z d_h = 2 e d b.
  • Conventional: $2 d f$.

The bottleneck width bfb \ll f and increased depth LL allow Hourglass MLPs to sustain cost parity while enhancing expressivity through deeper stacks operating in wider latent dimensions.

4. Empirical Performance and Scaling Behavior

Hourglass MLPs have been empirically evaluated on image-generation, denoising, and super-resolution tasks using MNIST and ImageNet-32 datasets:

Task Dataset Hourglass Params Conventional Params Hourglass PSNR Conventional PSNR
Denoising MNIST 66 M 75 M 22.31 dB 22.31 dB
Super-resolution ImageNet-32 69 M 87 M 24.00 dB 24.00 dB

Metrics employed include PSNR (dB), SSIM for reconstruction, and classification accuracy via prototype generation.

Hourglass MLPs consistently achieve superior performance–parameter Pareto frontiers in all evaluated settings. Optimization under increasing parameter budgets consistently drives Hourglass designs toward very large dzd_z (ed3K(e \cdot d \approx 3\,\mathrm{K}–4 K)) and moderate dhd_h (100(\approx 100–300)), while increasing network depth LL (4(4–8)) rather than bottleneck width. This “wider skip + narrower bottleneck + deeper stack” scaling is not Pareto-optimal for conventional MLPs.

5. Broader Implications and Application Extensions

Findings suggest reconsideration of skip connection dimensionality in residual networks. Replacing conventional feed-forward layers in Transformers with hourglass-style FFNs (dedbeddd \rightarrow e d \rightarrow b \rightarrow e d \rightarrow d), and adapting self-attention mechanisms to operate within the expanded latent ede d space, yields potential parameter savings in large-scale LLMs.

In architectures such as U-Nets and MLP-Mixers, injecting a fixed random lift into high-dimensional latent space and operating through narrow-bottleneck Hourglass blocks allows flexible adaptation for tasks including classification, segmentation, and generation. Any residual network currently employing skips at a narrow feature size may achieve increased expressivity and parameter efficiency by relocating skip connections into expanded spaces and routing learned incremental changes through cost-effective bottlenecks.

6. Practical Guidelines for Construction

Recommendations for Hourglass MLP configuration:

  1. Select ee such that ed3e d \approx 3–5 K when d1d \approx 1 K, ensuring geometry preservation via random lifts.
  2. Set dhd_h to a moderate range (50(50–300)) to maintain per-block cost parity with conventional blocks.
  3. Utilize maximal depth LL as allowed by the parameter budget; empirically, L=4L = 4–8 sufficiently saturates performance gains.
  4. Employ fixed random WinW_\text{in} to optimize parameter usage and memory bandwidth.
  5. Assess model selection along the performance–parameter frontier; Hourglass MLPs typically dominate across varied generative and classification benchmarks.

The scaling and architectural principles identified in Hourglass MLPs suggest wide applicability and invite further investigation into expanded skip-dimensionality and bottleneck routing within modern neural architectures (Chen et al., 2 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hourglass MLPs.