Convolutional Fully Recursive Perceptron Network (C-FRPN)
- The paper introduces a convolutional adaptation of recursive networks that achieves up to a 30–50% reduction in parameters while maintaining competitive accuracy.
- The architecture uses a recursive update mechanism with tied weights and a convergence check to simulate deep feature extraction without increasing model size.
- Empirical evaluations on benchmarks like CIFAR-10 and SVHN demonstrate faster convergence and improved generalization compared to conventional CNNs.
The Convolutional Fully Recursive Perceptron Network (C-FRPN) is a neural network architecture that embeds a recursive equilibrium-solving loop within each convolutional layer, enabling parameter-efficient and depth-adaptive modeling for image classification and related tasks. This architecture, introduced by the extension of the fully recursive perceptron network (FRPN) to convolutional neural networks (CNNs), supports deep feature extraction by simulating an unrolled stack of convolutional layers with tied weights, resulting in superior parameter efficiency compared to conventional CNNs (Rossi et al., 2019).
1. Mathematical Formulation and Recursive Mechanism
A single C-FRPN layer operates on an input tensor (feature-maps from the previous stage) and maintains a state tensor , updated recursively via a convolutional operation. The update rule is
where:
- is the input-to-state kernel,
- is the state-to-state feedback kernel,
- is a bias,
- is a pointwise nonlinearity (ReLU),
- denotes discrete convolution with stride 1 and appropriate zero-padding.
The recursion is terminated when either for some threshold (typically $0.1$), or when a maximum number of iterations (set to 8 in published experiments) is reached.
This process simulates stacking a deep sequence of layers with weight sharing. At equilibrium, the state satisfies the fixed-point equation:
Each recursive update thus implicitly corresponds to an additional hidden layer, enabling the network to represent complex computations to greater effective depth without increasing parameter count (Rossi et al., 2019).
2. Architectural Structure
The C-FRPN architecture instantiates standard CNN topologies in which each convolutional block is replaced with a recursive C-FRPN block. The canonical design comprises four such blocks in series, with details as follows:
- The first block uses kernels, subsequent blocks use .
- The number of output feature maps per block is adjusted to match a baseline CNN’s total parameter count. For example, one instantiation uses:
- Baseline CNN: [135, 120, 104, 85] filters, vs.
- C-FRPN: [96, 85, 74, 60] filters per block.
- Inside each block:
- Apply the recursive update .
- Perform Local Response Normalization over channels.
- Check for convergence () or .
- On convergence, apply max pooling (stride 2), feeding to the next block.
- Apply Dropout () after each convolution and pooling (except for the final block).
After the final C-FRPN block, the resulting features are processed by one or more standard fully connected layers producing class scores.
3. Training Regimen and Backpropagation
Training employs a standard cross-entropy loss over softmax outputs. Optimization is performed with Adam, using a learning rate of and weight decay . For backpropagation, the recursion is unrolled for up to 8 iterations, with gradient accumulation handled automatically by the computational graph. Weight-sharing is intrinsic and managed by the parameter definitions. The implementation does not use implicit differentiation of the fixed-point solution; instead, it relies on direct backpropagation through the unfolded computation graph (Rossi et al., 2019).
The per-block convergence check is
ensuring that compute requirements per forward pass remain bounded.
4. Parameter Efficiency and Computational Considerations
Each C-FRPN block requires parameters, shared across recursive iterations. In practice, a C-FRPN block with yields a parameter count comparable to a 4-layer CNN with filter counts [135, 120, 104, 85]. While a standard CNN executes one convolution per block per input, a C-FRPN block can require up to 8 convolutions (though typically 3–5 suffice for convergence).
Intermediate activations must be retained for backpropagation, modestly increasing memory usage, but with capped, the memory footprint remains practical.
Empirical results indicate that, for a given accuracy target, C-FRPN achieves up to 30–50% reduction in parameter count compared to baseline CNNs of equal depth (Rossi et al., 2019).
5. Empirical Results and Evaluation
C-FRPNs have been evaluated on several image classification benchmarks using standard data splits and augmentations:
CIFAR-10: 50,000 train / 10,000 test, with random cropping and horizontal flips.
- SVHN: 73,000 train / 26,000 test, with augmentations as in Liang & Hu (2015).
- ISIC (Melanoma): Custom splits, random horizontal/vertical flips, and rotations.
Six model sizes were considered, ranging from "large" (1.2M parameters) to "tiny" (50k parameters), with C-FRPN and standard CNNs matched for total parameter count.
Performance metrics are mean test accuracy across five initializations. Across all model capacities, C-FRPN outperforms baseline CNNs by 1–2%, with the performance gap widening as model size decreases. Learning curves for SVHN further reveal faster and more stable accuracy gains for C-FRPN across epochs.
6. Key Properties, Insights, and Limitations
Three principal properties are observed:
- Adaptive Depth: Iterating until convergence allows each C-FRPN layer to develop an effective depth adaptive to input complexity. Simpler inputs achieve equilibrium in fewer steps, while more complex cases use additional depth via further recursion.
- Expanded Approximation Power per Parameter: Recursive, tied-weight usage provides strong approximation efficiency, enabling deep feature extraction with proportionally fewer weights and reduced memory usage.
- Built-in Regularization: Weight-tying and early cessation of recursion together serve as an implicit regularizer, supporting improved generalization, particularly in small networks.
Noted limitations include the necessity of choosing an iteration budget (with in reported experiments), lack of convergence guarantees for all possible kernel settings, and increased computational or memory costs due to unrolled backpropagation (especially as recursion grows). Input-dependent iteration counts may complicate hardware acceleration. The paper suggests potential improvements such as implicit differentiation, learned convergence thresholds, ResNet/Highway-style gate incorporation, or hybrid dynamic scheduling approaches.
7. Context and Prospective Directions
C-FRPN offers a general framework for embedding recursion-based equilibrium solvers in CNNs, introducing a mechanism wherein feature extraction depth adapts naturally to task demand. The empirical finding that C-FRPNs attain equivalent or better accuracy with substantially fewer parameters, especially for small networks, highlights their applicability in resource-constrained scenarios. Future work may address limitations around fixed-point differentiation, convergence adaptivity, and integration with other forms of architectural adaptivity and gating (Rossi et al., 2019).