Set-Based MLPs

Updated 12 February 2026

Set-based Multi-Layer Perceptrons are neural networks that process unordered data by enforcing permutation invariance or equivariance using group-theoretic principles.
They employ parameter-tying with regular and high-order representations to achieve universal approximation while mitigating hidden layer size complexity.
Sparse and motif-based optimizations enhance computational efficiency, enabling practical applications in set classification, point cloud analysis, and modular network design.

Set-based Multi-Layer Perceptrons (MLPs) are a class of neural network architectures designed to process and learn functions on unordered data—sets—by leveraging group-theoretic invariance or equivariance, especially under the symmetric group $S_n$ . These models formalize the requirement that outputs should not change (invariant) or should change predictably (equivariant) under permutations of set elements. Recent advances prove the universality of such architectures, showing that appropriately constructed MLPs can approximate any continuous invariant or equivariant function, and relate these structures to algebraic manipulations and computational gains achievable via architectural innovations and sparsity.

1. Group Actions and Set Symmetry in MLPs

Let $G=S_n$ denote the symmetric group over $n$ elements; its action on input vectors $x\in\mathbb{R}^n$ is given by $(g\cdot x)_i = x_{g^{-1}(i)}$ , or equivalently, $x\mapsto A_g x$ with the permutation matrix $A_g\in\{0,1\}^{n\times n}$ . In Set-based MLPs, hidden and output layers are endowed with compatible permutation representations—i.e., the hidden layer is indexed by $H$ and equipped with the group action $h\mapsto g\cdot h$ via $H_g$ , and the output layer (of size $G=S_n$ 0) has $G=S_n$ 1. These structures enforce linear layers $G=S_n$ 2 to be $G=S_n$ 3-equivariant if $G=S_n$ 4 for all $G=S_n$ 5, and $G=S_n$ 6-invariant if $G=S_n$ 7 (the identity), requiring $G=S_n$ 8 for all $G=S_n$ 9 (Ravanbakhsh, 2020).

2. Universal Approximation Theorems for Set-Based MLPs

For the design of universal set-invariant and set-equivariant approximators, two key theorems emerge:

Theorem A (Invariant universality): Consider a one-hidden-layer MLP with hidden layer size $n$ 0, where the group acts regularly on hidden units ( $n$ 1). The network

$n$ 2

with weights tied as $n$ 3 and a shared bias $n$ 4 is a universal $n$ 5-invariant approximator.

Theorem B (Equivariant universality): Under the same construction but allowing general output action $n$ 6,

$n$ 7

with $n$ 8 and $n$ 9, yields a universal $x\in\mathbb{R}^n$ 0-equivariant approximator (Ravanbakhsh, 2020).

The implementation exploits parameter-tying driven by group action and exploits the regular representation for parameter efficiency. Universality is achieved as these configurations can uniformly approximate any continuous $x\in\mathbb{R}^n$ 1-invariant/equivariant function on compacta.

3. Hidden Layer Representation: Regular and High-Order Actions

While the regular representation yields a hidden-dimensional bottleneck of $x\in\mathbb{R}^n$ 2, high-order set actions enable more tractable models. Denoting $x\in\mathbb{R}^n$ 3 as the set of $x\in\mathbb{R}^n$ 4-tuples over $x\in\mathbb{R}^n$ 5, $x\in\mathbb{R}^n$ 6 acts diagonally. Proposition C states that a regular orbit exists for $x\in\mathbb{R}^n$ 7, reducing the necessary hidden dimensionality. Corollary D quantifies the bound as $x\in\mathbb{R}^n$ 8. For the pure set action, $x\in\mathbb{R}^n$ 9 suffices to guarantee a regular orbit, ensuring universality of the equivariant MLP (Ravanbakhsh, 2020). This insight allows for a polynomial- rather than factorial- sized hidden layer via $(g\cdot x)_i = x_{g^{-1}(i)}$ 0-tuples.

4. Algebraic Structures on Set-Based MLPs

A formal algebraic framework on the universe $(g\cdot x)_i = x_{g^{-1}(i)}$ 1 of layered MLPs enables systematic construction of complex networks from simpler components. Operations include:

Complementation: The complement $(g\cdot x)_i = x_{g^{-1}(i)}$ 2 inverts the output of a binary classifier.
Sum (Union): The sum $(g\cdot x)_i = x_{g^{-1}(i)}$ 3 produces an $(g\cdot x)_i = x_{g^{-1}(i)}$ 4-layer network representing the logical union.
Difference: Represented by $(g\cdot x)_i = x_{g^{-1}(i)}$ 5.
I-Product (Cartesian product): $(g\cdot x)_i = x_{g^{-1}(i)}$ 6 composes networks over direct-product domains.
O-Product (Output bundling): $(g\cdot x)_i = x_{g^{-1}(i)}$ 7 stacks outputs for multi-label or structured outputs.

These operations possess formal algebraic properties such as involution (complement is its own inverse), commutativity and associativity (sum, I-product), and existence of identity/inverse elements (Peng, 2017).

5. Concrete Architectures and Implementation Recipes

For $(g\cdot x)_i = x_{g^{-1}(i)}$ 8 using the regular representation, a minimal $(g\cdot x)_i = x_{g^{-1}(i)}$ 9-equivariant MLP comprises:

Hidden units indexed by $x\mapsto A_g x$ 0
Parameters: base input-weights $x\mapsto A_g x$ 1, bias $x\mapsto A_g x$ 2, base output-weights $x\mapsto A_g x$ 3
Parameter-tying: $x\mapsto A_g x$ 4, $x\mapsto A_g x$ 5
Forward pass:
- $x\mapsto A_g x$ 6 for all $x\mapsto A_g x$ 7
- $x\mapsto A_g x$ 8, where $x\mapsto A_g x$ 9 is the one-hot vector for $A_g\in\{0,1\}^{n\times n}$ 0
- Output $A_g\in\{0,1\}^{n\times n}$ 1 transforms equivariantly: $A_g\in\{0,1\}^{n\times n}$ 2 when $A_g\in\{0,1\}^{n\times n}$ 3

Examples clarify both $A_g\in\{0,1\}^{n\times n}$ 4-equivariant (vector-valued, e.g., $A_g\in\{0,1\}^{n\times n}$ 5, $A_g\in\{0,1\}^{n\times n}$ 6) and $A_g\in\{0,1\}^{n\times n}$ 7-invariant (scalar-valued, pooling over all hidden units) cases (Ravanbakhsh, 2020).

6. Sparse and Motif-Based Optimization in Set-MLPs

Sparse Evolutionary Training (SET) introduces sparsity to MLPs through an Erdős–Rényi random initialization and periodic pruning/regrowth cycles. The motif-based structural optimization further imposes block-level structure by organizing neurons into blocks (motifs) and pruning/regrowing entire $A_g\in\{0,1\}^{n\times n}$ 8 submatrices, with block assignment guided by average weight magnitude. This approach reduces parameter count and computational cost by a factor of $A_g\in\{0,1\}^{n\times n}$ 9 compared to standard SET, while maintaining high accuracy (empirical results: on Fashion-MNIST, motif size $H$ 0 yields 43.3% training time reduction for a 3.7% accuracy drop) (Chen et al., 10 Jun 2025).

A concise comparison of SET and motif-based SET is shown below:

Method	Param Savings	Accuracy Penalty	Training Time Savings
SET (m=1)	Baseline	None	Baseline
Motif-SET (m=2)	$H$ 1	<4%	30–43%
Motif-SET (m=4)	$H$ 2	10%	>60%

Motif-SET achieves best score-efficiency tradeoff for $H$ 3 across tasks, with only minor loss in performance (Chen et al., 10 Jun 2025).

7. Applications, Limitations, and Outlook

Set-based MLPs are applicable to any domain with unordered or permutation-symmetric data, including set classification, point cloud analysis, and tasks demanding invariance or equivariance under data permutation. The algebraic approach enables construction of architectures tailored to data with decomposable or product structure, facilitating modular design and interpretation (Peng, 2017). Sparse and motif-based variants provide computationally efficient realizations suitable for high-dimensional feature selection and large-scale learning (Chen et al., 10 Jun 2025).

Limitations include exponential hidden-layer size in the worst case (full regular representation); however, high-order set representations and polynomially sized hidden layers mitigate this. The underlying algebra is specific to MLPs with fixed activations; extensions to convolutional or recurrent structures and the full range of group actions remain active research topics (Ravanbakhsh, 2020, Peng, 2017). Motif-based strategies open avenues for hardware-aware design and scaling, suggesting opportunities for further theoretical and empirical exploration (Chen et al., 10 Jun 2025).

Markdown Report Issue Upgrade to Chat

References (3)

Universal Equivariant Multilayer Perceptrons (2020)

Multilayer Perceptron Algebra (2017)

A Topological Improvement of the Overall Performance of Sparse Evolutionary Training: Motif-Based Structural Optimization of Sparse MLPs Project (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Set-Based Multi-Layer Perceptrons (MLPs).