Validate HyperMLP/HyperGLU at frontier LLM scales
Determine whether the HyperMLP and HyperGLU architectures maintain their capability and efficiency advantages when scaled to very large, practical large language model sizes by training and evaluating HyperMLP/HyperGLU at frontier scales and comparing against strong softmax-attention Transformer baselines.
References
Additionally, due to resource constraints, we do not scale HyperMLP/HyperGLU to very large practical LLM sizes; validating performance at frontier scales is left for future work.
— HyperMLP: An Integrated Perspective for Sequence Modeling
(2602.12601 - Lu et al., 13 Feb 2026) in Section 4, Conclusion and Limitations