Residual Block Formulation
- Residual block formulation is a design that adds a mapping output (F(x)) to the input, enabling iterative refinement and efficient convergence.
- In deep networks, residual blocks mitigate vanishing gradients by incorporating identity skip connections alongside normalization and activation layers.
- In numerical linear algebra, block residuals drive methods like Krylov subspace techniques and block Kaczmarz iterations to enforce orthogonality and accelerate convergence.
A residual block formulation defines an architectural or algorithmic primitive in which outputs of a (possibly nonlinear, learned, or iterative) mapping are combined with their own inputs via addition or projection. This operator is central both in modern deep networks, as exemplified by ResNets, and in block-structured iterative linear algebra, such as block Krylov subspace methods and block Kaczmarz iterations. While the semantics and purpose of the "residual block" differ by context, its generic mathematical form is an update or a projection , with or structured to enable efficient learning, iterative refinement, orthogonalization, or multi-vector acceleration.
1. General Formulation of Residual Blocks
The canonical expression for a residual block in feedforward architectures is
where is a nonlinear operator, typically parameterized by weight matrices and incorporating batch normalization and activation functions. In Krylov methods and iterative block solvers, the block residual at iteration is usually defined as
with a block of approximate solutions and the block of right-hand sides. Block residuals serve as both a direction for further refinement and an object for enforcing block-orthogonality, block-minimization, or block-projection (Jastrzębski et al., 2017, Soodhalter, 2013, Gu et al., 2016, Sun et al., 2024, Massei et al., 7 Apr 2025).
2. Residual Block Design in Deep Networks
In deep learning, residual blocks enable identity skip connections, directly mitigating vanishing gradient phenomena and enabling iterative feature refinement. The general two-layer residual unit is
with typically a sequence of two convolution-BN-ReLU (or similar) layers. Systematic investigations reveal multiple implementation alternatives, differing in the placement of batch normalization (BN) and activation (ReLU) with respect to the addition:
| Variant | (Main Branch) | (Residual Merge) |
|---|---|---|
| RB1 | BN(Conv2(ReLU(Conv1()))) | ReLU() |
| RB2 | Conv2(ReLU(Conv1())) | ReLU(BN()) |
| RB3 | BN(Conv2(ReLU(Conv1()))) | ReLU() |
| RB4 | BN(Conv2(Conv1(ReLU()))) | |
| RB5 | Conv2(ReLU(BN(Conv1(ReLU(BN()))))) | |
| RB6 | BN(Conv2(ReLU(Conv1()))) | ReLU(BN()) |
These alternatives significantly affect end-to-end accuracy and optimization stability, with the best-performing variant depending on input normalization and domain (Naranjo-Alcazar et al., 2019).
Analytically, the residual block structure induces an update in feature space that approximates gradient descent on the layerwise loss:
Empirically, aligns negatively with the loss gradient, especially in higher network layers, confirming the iterative refinement interpretation (Jastrzębski et al., 2017).
3. Block Residuals in Krylov and Subspace Methods
Block Krylov subspace methods generalize single-vector approaches by propagating and updating blocks of vectors simultaneously. The block Arnoldi or Lanczos process produces an orthonormal basis spanning a block Krylov subspace, with each iteration enforcing residual orthogonality conditions:
Here, is a block residual, a block search direction, and the constraint subspace is constructed using either or conjugate orthogonality (Gu et al., 2016). In block MINRES based on the banded Lanczos method, the block residual at iteration ,
is minimized in Frobenius norm over a block Krylov space, with the minimization reducible to a small block least-squares system (Soodhalter, 2013).
In block rational Krylov approximations of matrix functions, the residual is further characterized by a block generalization of characteristic polynomials and collinearity relations:
with a block characteristic polynomial, enabling a hierarchy of error formulas and posteriori norm bounds (Massei et al., 7 Apr 2025).
4. Recycling and Augmented Block Arnoldi Residuals
In recycled Krylov and augmented Arnoldi methods, the block residual formulation is central for integrating a recycled subspace with new Krylov bases . The decomposition
leads to a residual expression
A block lower-triangular correction is included to orthogonalize the Krylov block against , followed by an inverse compact WY-modified Gram-Schmidt step for efficient and robust inter-block orthogonalization. To further accelerate convergence, a weighted oblique projection step is used:
applied to the residual, where is a weight matrix reflecting residual and recycle subspace alignment (Thomas et al., 2023).
5. Residual Block Methods in Stochastic Iterative Linear Solvers
Block partitioning and residual updates underpin the design of block Kaczmarz-type methods. At iterate , block residuals per partition are
The maximum-residual block Kaczmarz method deterministically selects the block with the maximal residual norm and projects orthogonally via the pseudoinverse:
A relaxation-based version (MRABK) computes a tailored step-size, replaces the full pseudoinverse with row-averaged updates, and provably achieves faster linear convergence rates than randomized block Kaczmarz (Sun et al., 2024).
6. Invertible Residual Block Flows in Generative Modeling
Residual block composition is also fundamental in the construction of invertible normalizing flows. A residual block on ,
ensures invertibility by Banach’s fixed-point theorem. Stacking such blocks yields a flow
Universal approximation in maximum mean discrepancy (MMD) can be achieved by stacking such blocks, with explicit first- and second-order bounds on MMD reduction rates (Kong et al., 2021).
7. Analysis of Residual Block Effectiveness and Practical Considerations
Residual block formulations, whether in neural architectures or iterative block methods, share the objective of enabling efficient, stable, and scalable progression toward solution or representation refinement. Empirical studies confirm that fine details within block structure (e.g., nonlinear placement, normalization order, block-relative orthogonality) can control convergence and generalization in numerical and learning contexts. The adaption of residual block orthogonalization, block correction, and weighted projections in large-scale, multi-right-hand-side, or recycling contexts further improves efficiency, with measurable reductions in iteration counts and computational cost (Thomas et al., 2023, Massei et al., 7 Apr 2025).
A comprehensive view reveals that block residual formulations are not only a structural convenience but a mathematically expressive and algorithmically pivotal ingredient across numerical linear algebra, optimization, and modern machine learning.