- The paper demonstrates that incorporating direct input connections in every hidden layer ensures that nonlinearity is both necessary and sufficient for universal approximation.
- It introduces a recursive IC–MLP architecture that leverages linear combinations and polynomial constructions to overcome classical depth limitations.
- The study opens avenues for future research on approximation rates, optimization landscapes, and efficiency comparisons with standard MLP architectures.
Introduction and Motivation
The universal approximation property (UAP) is central to theoretical analysis of MLPs, asserting that neural architectures can approximate arbitrary continuous functions on compact domains under suitable choices of activation functions and network parameters. Classical results—beginning with Cybenko (1989), Hornik et al. (1989), and Leshno et al. (1993)—clarified that for shallow networks, any non-polynomial (in fact, any nonlinear continuous) activation is sufficient for UAP, with precise characterizations depending on the input dimension and structural constraints.
With the transition to deep networks, the analysis becomes more delicate. Deep networks have, in general, more restrictive UAP criteria, especially under width or architectural constraints. Hanin and Sellke (2018) established sharp width-depth tradeoffs for ReLU networks, and Johnson (2019) demonstrated limits to expressiveness for deep, narrow networks with particular activations.
The paper "Universal Approximation Theorem for Input-Connected Multilayer Perceptrons" (2601.14026) advances this landscape by introducing the IC–MLP, an architecture where every hidden neuron, not just the first layer, receives direct affine connections from the raw input. The study systematically develops the theory for scalar and vector-valued inputs, gives explicit network constructions, and establishes an if-and-only-if universality result—nonlinearity of the activation is necessary and sufficient—without any further smoothness, monotonicity, or structural assumptions. This is a marked contrast with classical deep MLPs, where more intricate conditions are often required.
In the IC–MLP, the architecture is recursively defined. For an input x∈R (or x∈Rn), each neuron in hidden layer ℓ computes
hℓ,j=σ(i∑wℓ,jihℓ−1,i+aℓ,jx+bℓ,j)
for the scalar case (or with vector weightings for the multivariate input), where h0,1=x. The network output is
HL(x)=∑jvjhL,j+cx+d,
allowing for affine dependence from the input at the output node as well.
This defines a strict extension of conventional feedforward MLPs: the classical architecture is recovered by setting aℓ,j=0 for all ℓ≥2. Thus, any function representable by an MLP is expressible by an IC–MLP, but the reverse inclusion does not hold.
Theoretical Guarantees: Universal Approximation Theorems
Scalar Case
The primary theorem is the following equivalence for scalar inputs and networks of arbitrary (finite) depth:
- σ is nonlinear ⇔ For any closed interval [α,β] and f∈C([α,β]), IC–MLPs can uniformly approximate f arbitrarily well.
The proof utilizes algebraic closure properties and the recursive structure of IC–MLPs. With access to linear functions, the architecture is closed under linear combinations, affine compositions, and superpositions with σ. Nonlinearity ensures that x2 can be constructed as the limit of difference quotients involving mollified versions of σ. Stepwise, the argument leverages Taylor expansions, composition closure, and the Weierstrass theorem to establish density of polynomials and hence all continuous functions.
If σ is linear, the entire network class collapses to affine functions, precluding UAP.
Multivariate Case
The analysis extends by induction and leveraging the closure properties for scalar functions. With the direct input connections and nonlinearity, any coordinate function and thus any polynomial can be realized. The vector-valued input case uses the Stone–Weierstrass theorem, showing that Fn (the closure of IC–MLP outputs) is an algebra containing the coordinate projections and constants; thus polynomials are dense, and hence every continuous function on a compact subset is uniformly approximable.
Structural Significance and Architectural Implications
The expressiveness of IC–MLPs fundamentally arises from architectural closure under linear combinations. In classical MLPs, it is generally not possible to represent x+σ(x) with a finite-depth network when only the first hidden layer receives the input directly. The direct connections present in all layers rectify this, ensuring closure under linear combinations and enabling the construction of polynomials and their superpositions without architectural constraints on activation smoothness or monotonicity.
Unlike input-convex neural networks (ICNNs) that include direct input connections to enforce convexity, IC–MLPs have no sign constraints or convexity requirements. IC–MLPs are thus a clean strict generalization of standard MLPs and a distinct class from ICNNs, residual, or densely connected networks, with the input signal accessible at every depth.
Implications and Open Problems
Practical Implications:
The introduction of IC–MLPs simplifies the design of universal approximators, especially when minimal conditions on the nonlinearity of activations are valuable (e.g., in custom architectures, function approximation scenarios with exotic or restricted activations, or theoretical studies of neural expressivity).
Theoretical Implications:
This result recontextualizes the UAP for deep networks, demonstrating that, with input-connectedness, classical depth-based pathologies vanish: depth and nonlinear activation alone grant universality, independent of other function properties.
Contradictory Claims to Classical Results:
Classical deep UAP results usually require additional activation smoothness or width constraints. IC–MLPs circumvent these limitations, contrasting with Johnson's negative result for standard networks with activations approximable by injective functions.
Open Problems and Future Directions:
- The current analysis is qualitative—sharp quantitative results on approximation rates, explicit error estimates in terms of network depth and width, and resource-efficient realizations are open.
- Comparison of expressivity and efficiency (in terms of parameter count and training dynamics) between IC–MLPs and standard MLPs is unresolved.
- The consequences of input-connectedness for optimization landscapes and trainability remain to be elucidated.
Conclusion
The IC–MLP architecture establishes an elegant universality theorem: continuous nonlinearity of the activation is both necessary and sufficient for the UAP in both scalar and vector-input settings, for networks of arbitrary finite depth. Input-connectedness at every hidden layer endows the architecture with algebraic closure and direct access to polynomials and enables approximation without further technical conditions on the activation. This marks a conceptual simplification of universality for deep architectures and generates new questions regarding network design, quantitative approximation efficiency, and training in the context of input-connected neural networks (2601.14026).