Papers
Topics
Authors
Recent
Search
2000 character limit reached

Residual Block Formulation

Updated 22 January 2026
  • Residual block formulation is a design that adds a mapping output (F(x)) to the input, enabling iterative refinement and efficient convergence.
  • In deep networks, residual blocks mitigate vanishing gradients by incorporating identity skip connections alongside normalization and activation layers.
  • In numerical linear algebra, block residuals drive methods like Krylov subspace techniques and block Kaczmarz iterations to enforce orthogonality and accelerate convergence.

A residual block formulation defines an architectural or algorithmic primitive in which outputs of a (possibly nonlinear, learned, or iterative) mapping are combined with their own inputs via addition or projection. This operator is central both in modern deep networks, as exemplified by ResNets, and in block-structured iterative linear algebra, such as block Krylov subspace methods and block Kaczmarz iterations. While the semantics and purpose of the "residual block" differ by context, its generic mathematical form is an update y=x+F(x)y = x + F(x) or a projection y=xAry = x - A^\dagger r, with FF or AA structured to enable efficient learning, iterative refinement, orthogonalization, or multi-vector acceleration.

1. General Formulation of Residual Blocks

The canonical expression for a residual block in feedforward architectures is

hi+1=hi+Fi(hi)h_{i+1} = h_i + F_i(h_i)

where FiF_i is a nonlinear operator, typically parameterized by weight matrices and incorporating batch normalization and activation functions. In Krylov methods and iterative block solvers, the block residual at iteration kk is usually defined as

Rk=BAXkR_k = B - A X_k

with XkX_k a block of approximate solutions and BB the block of right-hand sides. Block residuals serve as both a direction for further refinement and an object for enforcing block-orthogonality, block-minimization, or block-projection (Jastrzębski et al., 2017, Soodhalter, 2013, Gu et al., 2016, Sun et al., 2024, Massei et al., 7 Apr 2025).

2. Residual Block Design in Deep Networks

In deep learning, residual blocks enable identity skip connections, directly mitigating vanishing gradient phenomena and enabling iterative feature refinement. The general two-layer residual unit is

y=x+F(x)y = x + \mathcal{F}(x)

with F(x)\mathcal{F}(x) typically a sequence of two convolution-BN-ReLU (or similar) layers. Systematic investigations reveal multiple implementation alternatives, differing in the placement of batch normalization (BN) and activation (ReLU) with respect to the addition:

Variant F(x)\mathcal{F}(x) (Main Branch) yy (Residual Merge)
RB1 BN(Conv2(ReLU(Conv1(xx)))) ReLU(F(x)+x\mathcal{F}(x) + x)
RB2 Conv2(ReLU(Conv1(xx))) ReLU(BN(F(x)+x\mathcal{F}(x)+x))
RB3 BN(Conv2(ReLU(Conv1(xx)))) F(x)+\mathcal{F}(x) + ReLU(xx)
RB4 BN(Conv2(Conv1(ReLU(xx)))) F(x)+x\mathcal{F}(x) + x
RB5 Conv2(ReLU(BN(Conv1(ReLU(BN(xx)))))) F(x)+x\mathcal{F}(x) + x
RB6 BN(Conv2(ReLU(Conv1(xx)))) ReLU(BN(F(x)+x\mathcal{F}(x)+x))

These alternatives significantly affect end-to-end accuracy and optimization stability, with the best-performing variant depending on input normalization and domain (Naranjo-Alcazar et al., 2019).

Analytically, the residual block structure induces an update in feature space that approximates gradient descent on the layerwise loss:

L(hL)=L(hL1)+FL1(hL1),L/hL1+O(FL12)\mathcal{L}(h_{L}) = \mathcal{L}(h_{L-1}) + \langle F_{L-1}(h_{L-1}), \partial\mathcal{L}/\partial h_{L-1} \rangle + O(\|F_{L-1}\|^2)

Empirically, Fj(hj)F_j(h_j) aligns negatively with the loss gradient, especially in higher network layers, confirming the iterative refinement interpretation (Jastrzębski et al., 2017).

3. Block Residuals in Krylov and Subspace Methods

Block Krylov subspace methods generalize single-vector approaches by propagating and updating blocks of vectors simultaneously. The block Arnoldi or Lanczos process produces an orthonormal basis UjU_j spanning a block Krylov subspace, with each iteration enforcing residual orthogonality conditions:

Rk+1L,APkLR_{k+1} \perp \mathcal{L}, \quad A P_k \perp \mathcal{L}

Here, RkR_k is a block residual, PkP_k a block search direction, and the constraint subspace L\mathcal{L} is constructed using either AA or A2A^2 conjugate orthogonality (Gu et al., 2016). In block MINRES based on the banded Lanczos method, the block residual at iteration jj,

Fj=BAXjF_j = B - A X_j

is minimized in Frobenius norm over a block Krylov space, with the minimization reducible to a small block least-squares system (Soodhalter, 2013).

In block rational Krylov approximations of matrix functions, the residual is further characterized by a block generalization of characteristic polynomials and collinearity relations:

RB,j(z)=[ΛU(A)B][ΛU(z)]1R_{B,j}(z) = [\Lambda^U(A) \circ B][\Lambda^U(z)]^{-1}

with ΛU(λ)\Lambda^U(\lambda) a block characteristic polynomial, enabling a hierarchy of error formulas and posteriori norm bounds (Massei et al., 7 Apr 2025).

4. Recycling and Augmented Block Arnoldi Residuals

In recycled Krylov and augmented Arnoldi methods, the block residual formulation is central for integrating a recycled subspace UU with new Krylov bases VkV_k. The decomposition

A[U,Vk]=[U,Vk+1]Hk+1A [U, V_k] = [U, V_{k+1}] H_{k+1}

leads to a residual expression

Rk=[U,Vk+1](Hk+1ykβe1)R_k = [U, V_{k+1}] (H_{k+1} y_k - \beta e_1)

A block lower-triangular correction Tk=UAVkT_k = U^\top A V_k is included to orthogonalize the Krylov block against UU, followed by an inverse compact WY-modified Gram-Schmidt step for efficient and robust inter-block orthogonalization. To further accelerate convergence, a weighted oblique projection step is used:

P=U(UWU)1UWP = U (U^\top W U)^{-1} U^\top W

applied to the residual, where WW is a weight matrix reflecting residual and recycle subspace alignment (Thomas et al., 2023).

5. Residual Block Methods in Stochastic Iterative Linear Solvers

Block partitioning and residual updates underpin the design of block Kaczmarz-type methods. At iterate xkx_k, block residuals per partition ViV_i are

rk(i)=bViAVixkr_k^{(i)} = b_{V_i} - A_{V_i} x_k

The maximum-residual block Kaczmarz method deterministically selects the block with the maximal residual norm and projects xkx_k orthogonally via the pseudoinverse:

xk+1=xk+AVik(bVikAVikxk)x_{k+1} = x_k + A_{V_{i_k}}^\dagger (b_{V_{i_k}} - A_{V_{i_k}} x_k)

A relaxation-based version (MRABK) computes a tailored step-size, replaces the full pseudoinverse with row-averaged updates, and provably achieves faster linear convergence rates than randomized block Kaczmarz (Sun et al., 2024).

6. Invertible Residual Block Flows in Generative Modeling

Residual block composition is also fundamental in the construction of invertible normalizing flows. A residual block on Rd\mathbb{R}^d,

fθ(x)=x+gθ(x),Lip(gθ)1/2f_\theta(x) = x + g_\theta(x), \quad \text{Lip}(g_\theta) \leq 1/2

ensures invertibility by Banach’s fixed-point theorem. Stacking such blocks yields a flow

F(x)=fθNfθ1(x)F(x) = f_{\theta_N} \circ \cdots \circ f_{\theta_1}(x)

Universal approximation in maximum mean discrepancy (MMD) can be achieved by stacking N=O(log(1/δ))N = O(\log(1/\delta)) such blocks, with explicit first- and second-order bounds on MMD reduction rates (Kong et al., 2021).

7. Analysis of Residual Block Effectiveness and Practical Considerations

Residual block formulations, whether in neural architectures or iterative block methods, share the objective of enabling efficient, stable, and scalable progression toward solution or representation refinement. Empirical studies confirm that fine details within block structure (e.g., nonlinear placement, normalization order, block-relative orthogonality) can control convergence and generalization in numerical and learning contexts. The adaption of residual block orthogonalization, block correction, and weighted projections in large-scale, multi-right-hand-side, or recycling contexts further improves efficiency, with measurable reductions in iteration counts and computational cost (Thomas et al., 2023, Massei et al., 7 Apr 2025).

A comprehensive view reveals that block residual formulations are not only a structural convenience but a mathematically expressive and algorithmically pivotal ingredient across numerical linear algebra, optimization, and modern machine learning.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Residual Block Formulation.