Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder

Published 7 Nov 2025 in cs.LG and cs.AI | (2511.05745v1)

Abstract: Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting LLMs by decomposing token activations into combinations of human-understandable features. While SAEs provide crucial insights into LLM explanations, their practical adoption faces a fundamental challenge: better interpretability demands that SAEs' hidden layers have high dimensionality to satisfy sparsity constraints, resulting in prohibitive training and inference costs. Recent Mixture of Experts (MoE) approaches attempt to address this by partitioning SAEs into narrower expert networks with gated activation, thereby reducing computation. In a well-designed MoE, each expert should focus on learning a distinct set of features. However, we identify a \textit{critical limitation} in MoE-SAE: Experts often fail to specialize, which means they frequently learn overlapping or identical features. To deal with it, we propose two key innovations: (1) Multiple Expert Activation that simultaneously engages semantically weighted expert subsets to encourage specialization, and (2) Feature Scaling that enhances diversity through adaptive high-frequency scaling. Experiments demonstrate a 24\% lower reconstruction error and a 99\% reduction in feature redundancy compared to existing MoE-SAE methods. This work bridges the interpretability-efficiency gap in LLM analysis, allowing transparent model inspection without compromising computational feasibility.

Abstract PDF Upgrade to Chat

Summary

The paper proposes Scale SAE that deploys multiple expert activation to reduce feature redundancy and enhance specialization.
It introduces a feature scaling mechanism inspired by high-pass filtering to amplify essential high-frequency components.
Experiments on GPT-2 activations demonstrate improved reconstruction error and feature diversity over traditional sparse autoencoders.

Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder

Introduction

The paper "Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder" introduces a strategy to improve the interpretability and computational efficiency of Sparse Autoencoders (SAEs) in analyzing LLMs. SAEs have faced limitations due to non-specialized learning within Mixture of Experts (MoE) architectures, resulting in feature redundancy. This paper proposes a novel architecture called Scale Sparse Autoencoder (Scale SAE) that integrates Multiple Expert Activation and Feature Scaling to enhance specialization and diversity.

Methodology

Scale Sparse Autoencoder Architecture

Scale SAE consists of two synergistic mechanisms designed to address specialization and feature redundancy challenges inherent in MoE. The Multiple Expert Activation engages subsets of experts dynamically, helping to decompose polysemantic neuron activations into distinct semantic components across experts. Conversely, Feature Scaling adaptively enhances the high-frequency components of encoder features, encouraging diversity and reducing redundancy.

Figure 1: Scale Sparse Autoencoder Architecture. An illustration of the three core mechanisms in the Scale SAE architecture. (a) Multiple Expert Activation. A router selects a subset of experts (e.g., 2 out of 3 shown) to process each input. (b) Global Top-K Activation. The activations from the selected experts are aggregated, and a global Top-K operation (K=3 shown) is applied to enforce sparsity. (c) Feature Scaling. The encoder weights of each expert are decomposed and scaled to dynamically amplify high-frequency components.

Multiple Expert Activation

Switch SAE's limitation, rooted in activating a single expert, leads to high feature redundancy. The Scale SAE modifies the routing mechanism to activate multiple experts, thereby encouraging structured specialization and distinct domain sensitivity.

Feature Scaling

An activation collapse challenge is mitigated through Feature Scaling, inspired by high-pass filtering in signal processing. This mechanism amplifies high-frequency components, aimed at preserving fine-grained information and reducing redundancy.

Figure 2: The intensity of different experts being activated.

Experiment

The architecture was rigorously tested on the intermediate layer activations of GPT-2, revealing that Scale SAE consistently outperformed traditional SAEs like Switch SAE, in terms of reconstruction error reduction and feature similarity scores.

Figure 3: Performance comparison of the three feature decomposition strategies across a range of sparsity levels.

Implications and Future Directions

This research bridges the gap in LLM interpretability and efficiency, presenting a framework that potentially could refine model evaluations in future experiments. While current results are robust, future advancements may incorporate more sophisticated routing or scaling mechanisms to enhance specialization further.

Conclusion

Scale SAE represents a significant advancement in MoE architectures, not through radical changes but through effective utilization of multi-expert activation and feature scaling to resolve feature redundancy issues. The proposed framework promises enhancements in computational efficiency and interpretability, facilitating more insightful analysis of LLMs without compromising efficacy.