Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fully Recursive Perceptron Network (FRPN)

Updated 1 January 2026
  • FRPN is a recursive neural model that refines its hidden state via iterative updates until reaching a fixed-point equilibrium, simulating deep feedforward networks with shared parameters.
  • The C-FRPN extension applies recursive computation to convolutional layers, yielding adaptive, multi-dimensional feature maps and outperforming standard CNNs in low-parameter scenarios.
  • Training FRPNs involves backpropagation through time with convergence checks, combined with regularization techniques like L2 weight decay and dropout to ensure stability and efficiency.

A Fully Recursive Perceptron Network (FRPN) is a recursive neural architecture in which the hidden representation is refined via an iterative process, converging to a fixed-point equilibrium rather than propagating through a fixed-depth stack of distinct layers. This mechanism enables the simulation of arbitrarily deep feedforward networks within a compact, parameter-shared framework. The Convolutional Fully Recursive Perceptron Network (C-FRPN) extends this recursive formulation to convolutional neural networks (CNNs), enabling recursive computation over multi-dimensional feature maps and yielding variable-depth architectures adaptive to input and convergence dynamics. Empirical evaluation demonstrates that C-FRPNs consistently surpass standard CNNs in accuracy for a given parameter budget, especially in the small-network regime, indicating superior parameter efficiency and modeling capacity (Rossi et al., 2019).

1. Mathematical Formulation of FRPN

The FRPN core model is defined by an iterative update for the hidden state vector. For input vector uRmu \in \mathbb{R}^m, the hidden state at iteration tt is x(t)Rnx(t) \in \mathbb{R}^n. Let αRn×m\alpha \in \mathbb{R}^{n \times m} denote the input-to-hidden weight matrix, βRn×n\beta \in \mathbb{R}^{n \times n} the hidden-to-hidden weight matrix, bRnb \in \mathbb{R}^n the bias, and f:RRf : \mathbb{R} \to \mathbb{R} an elementwise nonlinearity (e.g., ReLU). The recurrence is:

x(t)=f(αu+βx(t1)+b),t=1,2,3,x(t) = f\left( \alpha u + \beta x(t-1) + b \right), \quad t=1,2,3, \dots

with xi(t)=f(j=1mαijuj+k=1nβikxk(t1)+bi)x_i(t) = f\left(\sum_{j=1}^m \alpha_{ij} u_j + \sum_{k=1}^n \beta_{ik} x_k(t-1) + b_i \right) for i=1,,ni=1, \ldots, n.

The network output is computed by a standard feedforward head: y=g(Wox(T)+bo)y = g(W_o x(T) + b_o), where gg is the output nonlinearity.

At convergence, the hidden state satisfies the fixed-point condition:

x=f(αu+βx+b)x^* = f(\alpha u + \beta x^* + b)

Provided ff is Lipschitz with constant LfL_f and β2Lf<1\|\beta\|_2 L_f < 1, the iteration yields a unique fixed point via contraction mapping.

2. Unfolding and Connection to Deep Feedforward Networks

Unrolling the iteration over TT steps recovers a TT-layer feedforward network with parameter sharing across layers, since each transformation is identical in form. The unfolded representations:

x(1)=f(αu+βx(0)+b) x(2)=f(αu+βx(1)+b)  x(T)=f(αu+βx(T1)+b)\begin{align*} x(1) &= f(\alpha u + \beta x(0) + b) \ x(2) &= f(\alpha u + \beta x(1) + b) \ &\vdots \ x(T) &= f(\alpha u + \beta x(T-1) + b) \end{align*}

Formally, any deep multilayer perceptron of depth TT can be exactly represented, for suitable nn and parameter selection, by an FRPN after TT unfolding steps (Rossi et al., 2019). The equilibrium perspective eliminates the arbitrary selection of “depth” in favor of dynamically determined iterative computation.

3. Training Methodology

FRPNs are trained with objective functions standard to their task domain, e.g., cross-entropy loss for classification or MSE for regression, evaluated at the output layer. Training employs backpropagation through time (BPTT), potentially truncated at tmaxt_{\max}, to compute gradients, and uses Adam or SGD with momentum for parameter updates. Regularization techniques include L2L_2 weight decay, dropout (applied to output or recurrent links), and optional batch normalization interleaved with iteration steps. Convergence in practice is monitored by the criterion:

x(t)x(t1)2<ε\|x(t) - x(t-1)\|_2 < \varepsilon

or t=tmaxt = t_{\max} (with ε=0.1\varepsilon = 0.1, tmax=8t_{\max} = 8 in reported experiments).

4. Convolutional Extension: C-FRPN

In C-FRPN, each FRPN hidden state and input are stacks of 2D feature maps, replacing vectorial operations with convolutions. For SS input feature maps u={us}s=1S\mathbf{u} = \{u_s\}_{s=1}^S and KK state maps x(t)={xk(t)}k=1K\mathbf{x}(t) = \{x_k(t)\}_{k=1}^K, the update is

xk(t)=f ⁣(s=1SUk,sus+k=1KWk,kxk(t1)+bk)x_k(t) = f\!\left( \sum_{s=1}^S U_{k,s} * u_s + \sum_{k'=1}^K W_{k,k'} * x_{k'}(t-1) + b_k \right)

where Uk,sU_{k,s} and Wk,kW_{k,k'} are convolution kernels, * denotes 2D convolution, and bkb_k is a bias term. Each recursive block comprises one such C-FRPN layer followed by pooling.

C-FRPN architectures typically stack four recursive blocks, each followed by 3×33 \times 3 max-pooling (stride 2) and dropout (p=0.5p=0.5 except after the final block). Within blocks, the first convolution uses 5×55 \times 5 kernels, subsequent recursions use 3×33 \times 3 kernels, and local-response normalization is applied after each iteration.

5. Experimental Results and Performance Analysis

Evaluation on standard image benchmarks—including CIFAR-10, SVHN, and ISIC melanoma classification—demonstrates that C-FRPN consistently outperforms parameter-matched CNNs. In the low-parameter regime (20\sim 20 K parameters):

  • CIFAR-10: Baseline CNN 69.1%±1.2%69.1\% \pm 1.2\%, C-FRPN 72.4%±0.8%72.4\% \pm 0.8\%
  • SVHN: CNN 84.5%±0.9%84.5\% \pm 0.9\%, C-FRPN 87.1%±0.7%87.1\% \pm 0.7\%
  • ISIC: CNN 78.3%±1.5%78.3\% \pm 1.5\%, C-FRPN 81.9%±1.2%81.9\% \pm 1.2\%

For wider models ($100$ K–$500$ K), the advantage narrows to $1$–$2$ percentage points but remains consistent. All experiments used Adam optimizer (learning rate 10410^{-4}, weight decay 5×1045 \times 10^{-4}), five-trial medians, and standard augmentation.

The following table summarizes classification accuracy (mean ± std) from the reported experiments:

Dataset CNN (small model) C-FRPN (small model)
CIFAR-10 69.1%±1.2%69.1\% \pm 1.2\% 72.4%±0.8%72.4\% \pm 0.8\%
SVHN 84.5%±0.9%84.5\% \pm 0.9\% 87.1%±0.7%87.1\% \pm 0.7\%
ISIC 78.3%±1.5%78.3\% \pm 1.5\% 81.9%±1.2%81.9\% \pm 1.2\%

The largest gains are found in small models (<50<50 K parameters), indicating increased expressive efficiency.

6. Implementation Considerations and Parameter Efficiency

The depth of computation in each FRPN or C-FRPN block is dynamically determined by a convergence test, rather than fixed architectural design. Typical networks use up to tmax=8t_{\max} = 8 iterations per block. C-FRPN widths in reported benchmarks were [96,85,74,60,30,15][96, 85, 74, 60, 30, 15] feature maps per layer. The architecture exploits recurrence and parameter sharing to reduce redundancy, enabling smaller models to attain accuracies comparable to substantially larger CNNs.

Regularization is essential for training stability, with L2L_2 weight decay, batch normalization after each iterative step, and dropout after pooling. Contraction property of ff and β\beta guarantees stable equilibrium under standard settings.

A plausible implication is that the recursive mechanism inherent to FRPN and C-FRPN architectures induces a form of implicit depth adaptation and parameter reuse, leading to superior performance-to-parameter ratios relative to conventional feedforward designs.

7. Significance and Applications

FRPN and C-FRPN architectures generalize the deep neural network paradigm by replacing rigid, architecturally determined depth with a learned, recursive computation converging to equilibrium. This results in adaptive, parameter-efficient models. Their demonstrated advantages on image classification tasks, especially in low-parameter or resource-constrained settings, mark these models as strong alternatives for compact or embedded deployment.

These architectures also suggest a principled route to simulating arbitrarily deep networks without explicit depth design, and their underlying principles parallel recurrent equilibrium models and deep implicit networks, contributing to the broader understanding of recursive computation within neural architectures (Rossi et al., 2019).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fully Recursive Perceptron Network (FRPN).