Fully Recursive Perceptron Network (FRPN)
- FRPN is a recursive neural model that refines its hidden state via iterative updates until reaching a fixed-point equilibrium, simulating deep feedforward networks with shared parameters.
- The C-FRPN extension applies recursive computation to convolutional layers, yielding adaptive, multi-dimensional feature maps and outperforming standard CNNs in low-parameter scenarios.
- Training FRPNs involves backpropagation through time with convergence checks, combined with regularization techniques like L2 weight decay and dropout to ensure stability and efficiency.
A Fully Recursive Perceptron Network (FRPN) is a recursive neural architecture in which the hidden representation is refined via an iterative process, converging to a fixed-point equilibrium rather than propagating through a fixed-depth stack of distinct layers. This mechanism enables the simulation of arbitrarily deep feedforward networks within a compact, parameter-shared framework. The Convolutional Fully Recursive Perceptron Network (C-FRPN) extends this recursive formulation to convolutional neural networks (CNNs), enabling recursive computation over multi-dimensional feature maps and yielding variable-depth architectures adaptive to input and convergence dynamics. Empirical evaluation demonstrates that C-FRPNs consistently surpass standard CNNs in accuracy for a given parameter budget, especially in the small-network regime, indicating superior parameter efficiency and modeling capacity (Rossi et al., 2019).
1. Mathematical Formulation of FRPN
The FRPN core model is defined by an iterative update for the hidden state vector. For input vector , the hidden state at iteration is . Let denote the input-to-hidden weight matrix, the hidden-to-hidden weight matrix, the bias, and an elementwise nonlinearity (e.g., ReLU). The recurrence is:
with for .
The network output is computed by a standard feedforward head: , where is the output nonlinearity.
At convergence, the hidden state satisfies the fixed-point condition:
Provided is Lipschitz with constant and , the iteration yields a unique fixed point via contraction mapping.
2. Unfolding and Connection to Deep Feedforward Networks
Unrolling the iteration over steps recovers a -layer feedforward network with parameter sharing across layers, since each transformation is identical in form. The unfolded representations:
Formally, any deep multilayer perceptron of depth can be exactly represented, for suitable and parameter selection, by an FRPN after unfolding steps (Rossi et al., 2019). The equilibrium perspective eliminates the arbitrary selection of “depth” in favor of dynamically determined iterative computation.
3. Training Methodology
FRPNs are trained with objective functions standard to their task domain, e.g., cross-entropy loss for classification or MSE for regression, evaluated at the output layer. Training employs backpropagation through time (BPTT), potentially truncated at , to compute gradients, and uses Adam or SGD with momentum for parameter updates. Regularization techniques include weight decay, dropout (applied to output or recurrent links), and optional batch normalization interleaved with iteration steps. Convergence in practice is monitored by the criterion:
or (with , in reported experiments).
4. Convolutional Extension: C-FRPN
In C-FRPN, each FRPN hidden state and input are stacks of 2D feature maps, replacing vectorial operations with convolutions. For input feature maps and state maps , the update is
where and are convolution kernels, denotes 2D convolution, and is a bias term. Each recursive block comprises one such C-FRPN layer followed by pooling.
C-FRPN architectures typically stack four recursive blocks, each followed by max-pooling (stride 2) and dropout ( except after the final block). Within blocks, the first convolution uses kernels, subsequent recursions use kernels, and local-response normalization is applied after each iteration.
5. Experimental Results and Performance Analysis
Evaluation on standard image benchmarks—including CIFAR-10, SVHN, and ISIC melanoma classification—demonstrates that C-FRPN consistently outperforms parameter-matched CNNs. In the low-parameter regime ( K parameters):
- CIFAR-10: Baseline CNN , C-FRPN
- SVHN: CNN , C-FRPN
- ISIC: CNN , C-FRPN
For wider models ($100$ K–$500$ K), the advantage narrows to $1$–$2$ percentage points but remains consistent. All experiments used Adam optimizer (learning rate , weight decay ), five-trial medians, and standard augmentation.
The following table summarizes classification accuracy (mean ± std) from the reported experiments:
| Dataset | CNN (small model) | C-FRPN (small model) |
|---|---|---|
| CIFAR-10 | ||
| SVHN | ||
| ISIC |
The largest gains are found in small models ( K parameters), indicating increased expressive efficiency.
6. Implementation Considerations and Parameter Efficiency
The depth of computation in each FRPN or C-FRPN block is dynamically determined by a convergence test, rather than fixed architectural design. Typical networks use up to iterations per block. C-FRPN widths in reported benchmarks were feature maps per layer. The architecture exploits recurrence and parameter sharing to reduce redundancy, enabling smaller models to attain accuracies comparable to substantially larger CNNs.
Regularization is essential for training stability, with weight decay, batch normalization after each iterative step, and dropout after pooling. Contraction property of and guarantees stable equilibrium under standard settings.
A plausible implication is that the recursive mechanism inherent to FRPN and C-FRPN architectures induces a form of implicit depth adaptation and parameter reuse, leading to superior performance-to-parameter ratios relative to conventional feedforward designs.
7. Significance and Applications
FRPN and C-FRPN architectures generalize the deep neural network paradigm by replacing rigid, architecturally determined depth with a learned, recursive computation converging to equilibrium. This results in adaptive, parameter-efficient models. Their demonstrated advantages on image classification tasks, especially in low-parameter or resource-constrained settings, mark these models as strong alternatives for compact or embedded deployment.
These architectures also suggest a principled route to simulating arbitrarily deep networks without explicit depth design, and their underlying principles parallel recurrent equilibrium models and deep implicit networks, contributing to the broader understanding of recursive computation within neural architectures (Rossi et al., 2019).