Wave Network: An Ultra-Small Language Model

Published 4 Nov 2024 in cs.CL and cs.AI | (2411.02674v4)

Abstract: We propose an innovative token representation and update method in a new ultra-small LLM: the Wave network. Specifically, we use a complex vector to represent each token, encoding both global and local semantics of the input text. A complex vector consists of two components: a magnitude vector representing the global semantics of the input text, and a phase vector capturing the relationships between individual tokens and global semantics. Experiments on the AG News text classification task demonstrate that, when generating complex vectors from randomly initialized token embeddings, our single-layer Wave Network achieves 90.91% accuracy with wave interference and 91.66% with wave modulation - outperforming a single Transformer layer using BERT pre-trained embeddings by 19.23% and 19.98%, respectively, and approaching the accuracy of the pre-trained and fine-tuned BERT base model (94.64%). Additionally, compared to BERT base, the Wave Network reduces video memory usage and training time by 77.34% and 85.62% during wave modulation. In summary, we used a 2.4-million-parameter small LLM to achieve accuracy comparable to a 100-million-parameter BERT model in text classification.

Abstract PDF HTML Upgrade to Chat

Authors (2)

Citations (1)

View on Semantic Scholar

Summary

The paper introduces the Wave Network, which uses complex vector token representations to capture both global and local semantics.
It employs innovative wave interference and modulation operations to dynamically update token representations with reduced computational demands.
Experimental results demonstrate competitive accuracy with up to 77.34% lower memory usage and 85.62% reduction in computation time compared to BERT.

"Wave Network: An Ultra-Small LLM" (2411.02674)

Introduction

The paper presents the Wave Network, a novel approach to NLP that leverages ultra-small LLMs to achieve comparable performance to large-scale models like BERT but with drastically reduced computational demands. The Wave Network innovatively encodes both global and local semantics using complex vector representations, drawing from concepts in signal processing to optimize LLM architecture for efficiency in terms of both memory and processing time.

Methodology

Complex Vector Token Representation

The model represents each token as a complex vector in polar coordinates, consisting of a magnitude vector representing global semantics and a phase vector capturing token-specific semantics. By leveraging concepts from signal processing, the magnitude encodes the holistic view of the entire text, while the phase captures the token’s relational semantics within the sentence.

Figure 1: Convert complex vector token representations from polar coordinates to Cartesian coordinates.

Wave Interference and Modulation

The Wave Network innovates on traditional network architectures by implementing wave-based operations to update token representations. Unlike standard dot products used in models like Transformer, the Wave Network employs operations akin to wave interference and modulation.

Wave Interference: Utilized to add semantic intensities of tokens; this mimics the constructive and destructive interference observed in physics, allowing nuanced representation updates based on token interactions.
Wave Modulation: Applies multiplicative interaction, akin to amplitude and phase modulation of waves, facilitating a more dynamic adaptation of token representations to signal adjustments.
Figure 2: An example of constructing representations with complex vector token representations.

Network Architecture

The network employs single-layer or multi-layer architectures, where initial embeddings of tokens are processed through a series of layers that apply wave operations. Each layer consists of linear transformations and normalization steps, replacing the attention-heavy architecture of traditional models with efficient wave-based operations.

Experimental Results

Performance Metrics

The experiments, conducted on datasets such as AG News and DBpedia14, showcased the model’s efficiency:

In AG News text classification, the Wave Network achieved accuracies of 90.91% using wave interference and 91.66% with wave modulation. These results are close to BERT’s 94.64% but with significantly improved resource efficiency.
Figure 3: Comparison between Wave Network and Transformer.
The Wave Network’s memory usage and training times are significantly reduced: video memory by 77.34% and computation time by 85.62% when compared to a standard BERT model. Table results affirm that the network performs competitively with large models across several datasets.

Discussion

The Wave Network achieves its efficiency through an innovative use of signal processing concepts, transforming NLP architecture design. By encoding text semantics in complex vector forms and updating through wave operations, the model minimizes resource demand without sacrificing performance. Its efficient architecture and reduced parameter count position it as a viable solution for deploying NLP in constrained environments.

Conclusion

The study posits the Wave Network as a sustainable and efficient alternative to large-scale NLP models, maintaining competitive accuracy while drastically reducing computational costs. The introduction of complex vector semantics and wave-based update mechanisms marks a promising direction for scalable and resource-efficient NLP applications. Future avenues may explore enhancements in semantic representation accuracy and broader multi-lingual performance metrics.

Markdown Report Issue