IC-MLP: Input-Connected MLP

Updated 27 January 2026

IC-MLP is a feedforward neural network architecture that integrates direct affine connections from the input to every hidden layer, ensuring robust universal approximation.
Its recursive formulation and explicit input connections enable algebraic closure and richer function spaces compared to standard MLPs.
The design accommodates both scalar and vector inputs, promoting enhanced expressiveness and practical applications in network theory and approximation.

The Input-Connected Multilayer Perceptron (IC-MLP) is a feedforward neural network architecture distinguished by direct affine connections from the raw input to every hidden unit in all hidden layers, in addition to standard inter-layer connectivity. Each hidden neuron, rather than relying solely on the output of the preceding layer, also incorporates an affine transformation of the input vector. This architectural modification, studied in both univariate and multivariate formulations, gives rise to a network class with robust algebraic closure properties and a universal approximation theorem under the minimal criterion that the activation function $\sigma:\mathbb{R}\to\mathbb{R}$ is nonlinear (Ismailov, 20 Jan 2026).

1. Formal Definition of IC-MLP Architecture

For scalar input $x\in\mathbb{R}$ and a fixed continuous activation function $\sigma:\mathbb{R}\to\mathbb{R}$ , the IC-MLP is defined recursively over $L$ hidden layers as follows:

Let $h^{(0)}(x)=x$ (input node output).
For $k=1, \ldots, L$ (hidden layers), each hidden layer output is

$h^{(k)}(x) = \sigma\left(W^{(k)} h^{(k-1)}(x) + U^{(k)} x + b^{(k)}\right) \in \mathbb{R}^{N_k},$

with $W^{(k)}$ an $N_k \times N_{k-1}$ weight matrix, $U^{(k)}$ an $N_k \times 1$ input-weight vector, and $b^{(k)} \in \mathbb{R}^{N_k}$ a bias vector.

The output layer computes

$f(x) = W^{(L+1)} h^{(L)}(x) + U^{(L+1)} x + b^{(L+1)}$

reducing to a scalar output when $N_{L+1}=1$ . Here, $W^{(L+1)}$ is typically a $1 \times N_L$ row vector ( $v^T$ ), $U^{(L+1)}=c \in \mathbb{R}$ , $b^{(L+1)}=d \in \mathbb{R}$ .

In the multivariate setting with $x \in \mathbb{R}^n$ , the affine input term is replaced by $\langle a_{k,j}, x \rangle$ for each neuron.

2. Layerwise and Iterated Formulas

The IC-MLP structure supports explicit, systematic descriptions of its function space for any finite depth:

Depth 0: $H_0(x) = c\,x + d$
Depth 1: With $N_1$ hidden units—parameters $(a_{1,i}, b_{1,i})$ , output weights $v_i$ , input weight $c$ , bias $d$ :

$h_{1,i}(x) = \sigma(a_{1,i} x + b_{1,i}),\ \ H_1(x) = \sum_{i=1}^{N_1} v_i h_{1,i}(x) + c x + d$

Depth 2: For $N_2$ second-layer units—weights $w_{2,ji}$ , input weights $a_{2,j}$ , biases $b_{2,j}$ , output weights $v_j$ :

$h_{2,j}(x) = \sigma\left(\sum_{i=1}^{N_1} w_{2,ji} \sigma(a_{1,i} x + b_{1,i}) + a_{2,j} x + b_{2,j}\right)$

$H_2(x) = \sum_{j=1}^{N_2} v_j\, h_{2,j}(x) + c x + d$

In general, the $L$ -layer functional form is

$H_L(x) = \sum_{j=1}^{N_L} v_j\, \sigma\left( \sum_{i=1}^{N_{L-1}} w_{L,ji} \sigma\bigg(\cdots \sigma(a_{1,i} x + b_{1,i})\cdots \bigg) + a_{L,j} x + b_{L,j} \right) + c x + d$

or in matrix notation, $H_L(x) = W^{(L+1)} h^{(L)}(x) + U^{(L+1)} x + b^{(L+1)}$ .

3. Universal Approximation Theorem for IC-MLP

Let $\sigma : \mathbb{R} \to \mathbb{R}$ be continuous. The following are equivalent:

$\sigma$ is nonlinear (i.e., not of the form $\sigma(t) = At + B$ ).
For all closed intervals $[\alpha, \beta]$ , every $f \in C([\alpha, \beta])$ , and every $\varepsilon > 0$ , there exists an $L$ and an $L$ -layer IC-MLP $H_L$ such that

$\sup_{x \in [\alpha, \beta]} |f(x) - H_L(x)| < \varepsilon$

Proof Outline: If $\sigma$ is affine, IC-MLPs are affine functions and cannot approximate arbitrary continuous functions. For nonlinear $\sigma$ , one constructs smooth approximants via mollification and shows, by explicit symmetric differences, that $x^2$ and all monomials are in the closure of the function space realized by IC-MLPs, hence all polynomials, and then all continuous functions by the Weierstrass theorem (Ismailov, 20 Jan 2026).

4. Extension to Vector-Valued Inputs

For $x \in \mathbb{R}^n$ , the IC-MLP maintains the direct affine input connection at every layer by incorporating terms of the form $\langle a_{k,j}, x \rangle$ in each hidden neuron. The universal approximation result also extends:

For any compact $K \subset \mathbb{R}^n$ , any $f \in C(K)$ , and any $\varepsilon > 0$ , there exists an IC-MLP such that

$\sup_{x \in K} |f(x) - H_L(x)| < \varepsilon$

if and only if $\sigma$ is nonlinear.

The function class $\mathcal{F}_n$ realized by IC-MLPs is closed under addition and superposition with scalar IC-MLPs, contains all constants and projections $x \mapsto x_i$ , and supports construction of all multivariate monomials. The Stone–Weierstrass theorem then ensures density in $C(K)$ .

5. Algebraic and Expressive Properties Compared to Standard MLPs

IC-MLPs differ from standard MLPs in key architectural and algebraic respects:

In standard MLPs, only the first hidden layer receives the input directly; in IC-MLPs, every hidden layer and the output receive an independent affine input.
The IC-MLP function class is closed under finite linear combinations. Summing two IC-MLPs yields another IC-MLP, facilitated by concatenating their outputs at the final layer. In typical MLPs, closure under addition holds only under strong restrictions on $\sigma$ .
For shallow MLPs, universal approximation requires $\sigma$ to be non-polynomial; for deep MLPs, non-affinity plus smoothness is needed. IC-MLPs admit the sharp minimal condition: $\sigma$ is nonlinear, for both scalar and vector-valued cases.
The algebraic structure of the IC-MLP hypothesis class supports direct use of classical density arguments with less technical machinery.

Property	IC-MLP	Standard MLP
Direct input per layer	Yes	Only first layer
Universality criterion	$\sigma$ non-affine	Non-polynomial or non-affine + smoothness
Algebraic closure	Addition, multiplication	Restricted

6. Implications for Network Theory and Approximation

The introduction of direct input connections at every hidden layer enables IC-MLPs to generate a strictly richer function space than standard MLPs: closed under addition and multiplication, containing all affine and polynomial functions as special cases. The simplicity of the universality condition and the recursive, transparent structure of proofs positions IC-MLPs as a theoretically robust model for exploring the topology and algebraic structure of neural network hypothesis classes, with implications for both functional analysis of neural networks and the study of universal approximation in deep learning architectures (Ismailov, 20 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Universal Approximation Theorem for Input-Connected Multilayer Perceptrons (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Input-Connected Multilayer Perceptron (IC-MLP).