Papers
Topics
Authors
Recent
Search
2000 character limit reached

Convolutional Fully Recursive Perceptron Network (C-FRPN)

Updated 28 January 2026
  • The paper introduces a convolutional adaptation of recursive networks that achieves up to a 30–50% reduction in parameters while maintaining competitive accuracy.
  • The architecture uses a recursive update mechanism with tied weights and a convergence check to simulate deep feature extraction without increasing model size.
  • Empirical evaluations on benchmarks like CIFAR-10 and SVHN demonstrate faster convergence and improved generalization compared to conventional CNNs.

The Convolutional Fully Recursive Perceptron Network (C-FRPN) is a neural network architecture that embeds a recursive equilibrium-solving loop within each convolutional layer, enabling parameter-efficient and depth-adaptive modeling for image classification and related tasks. This architecture, introduced by the extension of the fully recursive perceptron network (FRPN) to convolutional neural networks (CNNs), supports deep feature extraction by simulating an unrolled stack of convolutional layers with tied weights, resulting in superior parameter efficiency compared to conventional CNNs (Rossi et al., 2019).

1. Mathematical Formulation and Recursive Mechanism

A single C-FRPN layer operates on an input tensor uRC×H×Wu \in \mathbb{R}^{C \times H \times W} (feature-maps from the previous stage) and maintains a state tensor x(t)RF×H×Wx^{(t)} \in \mathbb{R}^{F \times H \times W}, updated recursively via a convolutional operation. The update rule is

x(t+1)=f(Wuu+Wxx(t)+b)x^{(t+1)} = f \left(W_u * u + W_x * x^{(t)} + b \right)

where:

  • WuRF×C×k×kW_u \in \mathbb{R}^{F \times C \times k \times k} is the input-to-state kernel,
  • WxRF×F×k×kW_x \in \mathbb{R}^{F \times F \times k \times k} is the state-to-state feedback kernel,
  • bRFb \in \mathbb{R}^F is a bias,
  • f()f(\cdot) is a pointwise nonlinearity (ReLU),
  • * denotes discrete convolution with stride 1 and appropriate zero-padding.

The recursion is terminated when either x(t+1)x(t)2<ε\| x^{(t+1)} - x^{(t)} \|_2 < \varepsilon for some threshold ε\varepsilon (typically $0.1$), or when a maximum number of iterations tmaxt_{\max} (set to 8 in published experiments) is reached.

This process simulates stacking a deep sequence of layers with weight sharing. At equilibrium, the state xx^* satisfies the fixed-point equation:

x=f(Wuu+Wxx+b)x^* = f(W_u * u + W_x * x^* + b)

Each recursive update thus implicitly corresponds to an additional hidden layer, enabling the network to represent complex computations to greater effective depth without increasing parameter count (Rossi et al., 2019).

2. Architectural Structure

The C-FRPN architecture instantiates standard CNN topologies in which each convolutional block is replaced with a recursive C-FRPN block. The canonical design comprises four such blocks in series, with details as follows:

  • The first block uses k=5k=5 kernels, subsequent blocks use k=3k=3.
  • The number of output feature maps FF per block is adjusted to match a baseline CNN’s total parameter count. For example, one instantiation uses:
    • Baseline CNN: [135, 120, 104, 85] filters, vs.
    • C-FRPN: [96, 85, 74, 60] filters per block.
  • Inside each block:

    1. Apply the recursive update x(t+1)=ReLU(Wuu+Wxx(t)+b)x^{(t+1)} = \textrm{ReLU}(W_u * u + W_x * x^{(t)} + b).
    2. Perform Local Response Normalization over FF channels.
    3. Check for convergence (x(t+1)x(t)<0.1\|x^{(t+1)} - x^{(t)}\| < 0.1) or t=8t=8.
    4. On convergence, apply 3×33 \times 3 max pooling (stride 2), feeding to the next block.
    5. Apply Dropout (p=0.5p=0.5) after each convolution and pooling (except for the final block).

After the final C-FRPN block, the resulting features are processed by one or more standard fully connected layers producing class scores.

3. Training Regimen and Backpropagation

Training employs a standard cross-entropy loss over softmax outputs. Optimization is performed with Adam, using a learning rate of 1×1041 \times 10^{-4} and weight decay 5×1045 \times 10^{-4}. For backpropagation, the recursion is unrolled for up to 8 iterations, with gradient accumulation handled automatically by the computational graph. Weight-sharing is intrinsic and managed by the parameter definitions. The implementation does not use implicit differentiation of the fixed-point solution; instead, it relies on direct backpropagation through the unfolded computation graph (Rossi et al., 2019).

The per-block convergence check is

x(t+1)x(t)2<0.1ort8\|x^{(t+1)} - x^{(t)}\|_2 < 0.1 \quad \textrm{or} \quad t \geq 8

ensuring that compute requirements per forward pass remain bounded.

4. Parameter Efficiency and Computational Considerations

Each C-FRPN block requires (CFk2+FFk2+F)(C \cdot F \cdot k^2 + F \cdot F \cdot k^2 + F) parameters, shared across recursive iterations. In practice, a C-FRPN block with F=96F=96 yields a parameter count comparable to a 4-layer CNN with filter counts [135, 120, 104, 85]. While a standard CNN executes one convolution per block per input, a C-FRPN block can require up to 8 convolutions (though typically 3–5 suffice for convergence).

Intermediate activations x(t)x^{(t)} must be retained for backpropagation, modestly increasing memory usage, but with tmaxt_{\max} capped, the memory footprint remains practical.

Empirical results indicate that, for a given accuracy target, C-FRPN achieves up to 30–50% reduction in parameter count compared to baseline CNNs of equal depth (Rossi et al., 2019).

5. Empirical Results and Evaluation

C-FRPNs have been evaluated on several image classification benchmarks using standard data splits and augmentations:

  • CIFAR-10: 50,000 train / 10,000 test, with random cropping and horizontal flips.

  • SVHN: 73,000 train / 26,000 test, with augmentations as in Liang & Hu (2015).
  • ISIC (Melanoma): Custom splits, random horizontal/vertical flips, and rotations.

Six model sizes were considered, ranging from "large" (\sim1.2M parameters) to "tiny" (\sim50k parameters), with C-FRPN and standard CNNs matched for total parameter count.

Performance metrics are mean test accuracy across five initializations. Across all model capacities, C-FRPN outperforms baseline CNNs by 1–2%, with the performance gap widening as model size decreases. Learning curves for SVHN further reveal faster and more stable accuracy gains for C-FRPN across epochs.

6. Key Properties, Insights, and Limitations

Three principal properties are observed:

  • Adaptive Depth: Iterating until convergence allows each C-FRPN layer to develop an effective depth adaptive to input complexity. Simpler inputs achieve equilibrium in fewer steps, while more complex cases use additional depth via further recursion.
  • Expanded Approximation Power per Parameter: Recursive, tied-weight usage provides strong approximation efficiency, enabling deep feature extraction with proportionally fewer weights and reduced memory usage.
  • Built-in Regularization: Weight-tying and early cessation of recursion together serve as an implicit regularizer, supporting improved generalization, particularly in small networks.

Noted limitations include the necessity of choosing an iteration budget (with tmax=8t_{\max}=8 in reported experiments), lack of convergence guarantees for all possible kernel settings, and increased computational or memory costs due to unrolled backpropagation (especially as recursion grows). Input-dependent iteration counts may complicate hardware acceleration. The paper suggests potential improvements such as implicit differentiation, learned convergence thresholds, ResNet/Highway-style gate incorporation, or hybrid dynamic scheduling approaches.

7. Context and Prospective Directions

C-FRPN offers a general framework for embedding recursion-based equilibrium solvers in CNNs, introducing a mechanism wherein feature extraction depth adapts naturally to task demand. The empirical finding that C-FRPNs attain equivalent or better accuracy with substantially fewer parameters, especially for small networks, highlights their applicability in resource-constrained scenarios. Future work may address limitations around fixed-point differentiation, convergence adaptivity, and integration with other forms of architectural adaptivity and gating (Rossi et al., 2019).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Convolutional Fully Recursive Perceptron Network (C-FRPN).