Papers
Topics
Authors
Recent
Search
2000 character limit reached

IC-MLP: Input-Connected MLP

Updated 27 January 2026
  • IC-MLP is a feedforward neural network architecture that integrates direct affine connections from the input to every hidden layer, ensuring robust universal approximation.
  • Its recursive formulation and explicit input connections enable algebraic closure and richer function spaces compared to standard MLPs.
  • The design accommodates both scalar and vector inputs, promoting enhanced expressiveness and practical applications in network theory and approximation.

The Input-Connected Multilayer Perceptron (IC-MLP) is a feedforward neural network architecture distinguished by direct affine connections from the raw input to every hidden unit in all hidden layers, in addition to standard inter-layer connectivity. Each hidden neuron, rather than relying solely on the output of the preceding layer, also incorporates an affine transformation of the input vector. This architectural modification, studied in both univariate and multivariate formulations, gives rise to a network class with robust algebraic closure properties and a universal approximation theorem under the minimal criterion that the activation function σ:RR\sigma:\mathbb{R}\to\mathbb{R} is nonlinear (Ismailov, 20 Jan 2026).

1. Formal Definition of IC-MLP Architecture

For scalar input xRx\in\mathbb{R} and a fixed continuous activation function σ:RR\sigma:\mathbb{R}\to\mathbb{R}, the IC-MLP is defined recursively over LL hidden layers as follows:

  • Let h(0)(x)=xh^{(0)}(x)=x (input node output).
  • For k=1,,Lk=1, \ldots, L (hidden layers), each hidden layer output is

h(k)(x)=σ(W(k)h(k1)(x)+U(k)x+b(k))RNk,h^{(k)}(x) = \sigma\left(W^{(k)} h^{(k-1)}(x) + U^{(k)} x + b^{(k)}\right) \in \mathbb{R}^{N_k},

with W(k)W^{(k)} an Nk×Nk1N_k \times N_{k-1} weight matrix, U(k)U^{(k)} an Nk×1N_k \times 1 input-weight vector, and b(k)RNkb^{(k)} \in \mathbb{R}^{N_k} a bias vector.

  • The output layer computes

f(x)=W(L+1)h(L)(x)+U(L+1)x+b(L+1)f(x) = W^{(L+1)} h^{(L)}(x) + U^{(L+1)} x + b^{(L+1)}

reducing to a scalar output when NL+1=1N_{L+1}=1. Here, W(L+1)W^{(L+1)} is typically a 1×NL1 \times N_L row vector (vTv^T), U(L+1)=cRU^{(L+1)}=c \in \mathbb{R}, b(L+1)=dRb^{(L+1)}=d \in \mathbb{R}.

In the multivariate setting with xRnx \in \mathbb{R}^n, the affine input term is replaced by ak,j,x\langle a_{k,j}, x \rangle for each neuron.

2. Layerwise and Iterated Formulas

The IC-MLP structure supports explicit, systematic descriptions of its function space for any finite depth:

  • Depth 0: H0(x)=cx+dH_0(x) = c\,x + d
  • Depth 1: With N1N_1 hidden units—parameters (a1,i,b1,i)(a_{1,i}, b_{1,i}), output weights viv_i, input weight cc, bias dd:

h1,i(x)=σ(a1,ix+b1,i),  H1(x)=i=1N1vih1,i(x)+cx+dh_{1,i}(x) = \sigma(a_{1,i} x + b_{1,i}),\ \ H_1(x) = \sum_{i=1}^{N_1} v_i h_{1,i}(x) + c x + d

  • Depth 2: For N2N_2 second-layer units—weights w2,jiw_{2,ji}, input weights a2,ja_{2,j}, biases b2,jb_{2,j}, output weights vjv_j:

h2,j(x)=σ(i=1N1w2,jiσ(a1,ix+b1,i)+a2,jx+b2,j)h_{2,j}(x) = \sigma\left(\sum_{i=1}^{N_1} w_{2,ji} \sigma(a_{1,i} x + b_{1,i}) + a_{2,j} x + b_{2,j}\right)

H2(x)=j=1N2vjh2,j(x)+cx+dH_2(x) = \sum_{j=1}^{N_2} v_j\, h_{2,j}(x) + c x + d

In general, the LL-layer functional form is

HL(x)=j=1NLvjσ(i=1NL1wL,jiσ(σ(a1,ix+b1,i))+aL,jx+bL,j)+cx+dH_L(x) = \sum_{j=1}^{N_L} v_j\, \sigma\left( \sum_{i=1}^{N_{L-1}} w_{L,ji} \sigma\bigg(\cdots \sigma(a_{1,i} x + b_{1,i})\cdots \bigg) + a_{L,j} x + b_{L,j} \right) + c x + d

or in matrix notation, HL(x)=W(L+1)h(L)(x)+U(L+1)x+b(L+1)H_L(x) = W^{(L+1)} h^{(L)}(x) + U^{(L+1)} x + b^{(L+1)}.

3. Universal Approximation Theorem for IC-MLP

Let σ:RR\sigma : \mathbb{R} \to \mathbb{R} be continuous. The following are equivalent:

  1. σ\sigma is nonlinear (i.e., not of the form σ(t)=At+B\sigma(t) = At + B).
  2. For all closed intervals [α,β][\alpha, \beta], every fC([α,β])f \in C([\alpha, \beta]), and every ε>0\varepsilon > 0, there exists an LL and an LL-layer IC-MLP HLH_L such that

supx[α,β]f(x)HL(x)<ε\sup_{x \in [\alpha, \beta]} |f(x) - H_L(x)| < \varepsilon

Proof Outline: If σ\sigma is affine, IC-MLPs are affine functions and cannot approximate arbitrary continuous functions. For nonlinear σ\sigma, one constructs smooth approximants via mollification and shows, by explicit symmetric differences, that x2x^2 and all monomials are in the closure of the function space realized by IC-MLPs, hence all polynomials, and then all continuous functions by the Weierstrass theorem (Ismailov, 20 Jan 2026).

4. Extension to Vector-Valued Inputs

For xRnx \in \mathbb{R}^n, the IC-MLP maintains the direct affine input connection at every layer by incorporating terms of the form ak,j,x\langle a_{k,j}, x \rangle in each hidden neuron. The universal approximation result also extends:

  • For any compact KRnK \subset \mathbb{R}^n, any fC(K)f \in C(K), and any ε>0\varepsilon > 0, there exists an IC-MLP such that

supxKf(x)HL(x)<ε\sup_{x \in K} |f(x) - H_L(x)| < \varepsilon

if and only if σ\sigma is nonlinear.

The function class Fn\mathcal{F}_n realized by IC-MLPs is closed under addition and superposition with scalar IC-MLPs, contains all constants and projections xxix \mapsto x_i, and supports construction of all multivariate monomials. The Stone–Weierstrass theorem then ensures density in C(K)C(K).

5. Algebraic and Expressive Properties Compared to Standard MLPs

IC-MLPs differ from standard MLPs in key architectural and algebraic respects:

  • In standard MLPs, only the first hidden layer receives the input directly; in IC-MLPs, every hidden layer and the output receive an independent affine input.
  • The IC-MLP function class is closed under finite linear combinations. Summing two IC-MLPs yields another IC-MLP, facilitated by concatenating their outputs at the final layer. In typical MLPs, closure under addition holds only under strong restrictions on σ\sigma.
  • For shallow MLPs, universal approximation requires σ\sigma to be non-polynomial; for deep MLPs, non-affinity plus smoothness is needed. IC-MLPs admit the sharp minimal condition: σ\sigma is nonlinear, for both scalar and vector-valued cases.
  • The algebraic structure of the IC-MLP hypothesis class supports direct use of classical density arguments with less technical machinery.
Property IC-MLP Standard MLP
Direct input per layer Yes Only first layer
Universality criterion σ\sigma non-affine Non-polynomial or non-affine + smoothness
Algebraic closure Addition, multiplication Restricted

6. Implications for Network Theory and Approximation

The introduction of direct input connections at every hidden layer enables IC-MLPs to generate a strictly richer function space than standard MLPs: closed under addition and multiplication, containing all affine and polynomial functions as special cases. The simplicity of the universality condition and the recursive, transparent structure of proofs positions IC-MLPs as a theoretically robust model for exploring the topology and algebraic structure of neural network hypothesis classes, with implications for both functional analysis of neural networks and the study of universal approximation in deep learning architectures (Ismailov, 20 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Input-Connected Multilayer Perceptron (IC-MLP).