NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning

Published 30 Mar 2024 in cs.CL | (2404.00459v2)

Abstract: LLMs struggle with handling numerical data and performing arithmetic operations. We hypothesize that this limitation can be partially attributed to non-intuitive textual numbers representation. When a digit is read or generated by a causal LLM it does not know its place value (e.g. thousands vs. hundreds) until the entire number is processed. To address this issue, we propose a simple adjustment to how numbers are represented by including the count of digits before each number. For instance, instead of "42", we suggest using "{2:42}" as the new format. This approach, which we term NumeroLogic, offers an added advantage in number generation by serving as a Chain of Thought (CoT). By requiring the model to consider the number of digits first, it enhances the reasoning process before generating the actual number. We use arithmetic tasks to demonstrate the effectiveness of the NumeroLogic formatting. We further demonstrate NumeroLogic applicability to general natural language modeling, improving language understanding performance in the MMLU benchmark.

Abstract PDF HTML Upgrade to Chat

References (16)

Citations (7)

View on Semantic Scholar

Summary

The paper introduces NumeroLogic, a novel numerical prefix encoding that guides LLMs by indicating digit count, thereby enhancing their numerical reasoning.
The methodology integrates pre- and post-processing using regular expressions and special tokens (e.g., <sn>, <mn>, <en>) to maintain model architecture compatibility.
Experiments on NanoGPT and Llama2-7B demonstrate significant accuracy gains, including nearly doubling multiplication accuracy and reducing subtraction errors by over 80%.

NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning

The paper "NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning" presents a novel numerical format to improve the numerical reasoning capabilities of LLMs. This format, termed "NumeroLogic," aims to address the sub-optimal representation of numbers in text, enhancing the comprehension and generation of numerical data within LLMs.

Introduction

LLMs, despite their capabilities, often struggle with handling numerical data and performing arithmetic operations. The inherent challenge lies in reading and processing numbers in a causal manner, which requires reaching the final digits before inferring the place value of initial digits. To mitigate this issue, the authors propose NumeroLogic—a numerical format where the digit count is indicated prior to the actual number. This approach provides a more intuitive structure, enabling LLMs to understand and generate numbers more effectively.

Figure 1: Reading numbers in a causal manner from left to right is sub-optimal for LLMs, as it is for humans. The model has to reach the final digits of the number before it can infer the place value of the first digit.

NumeroLogic Format

NumeroLogic is a format designed for LLMs where a numerical prefix indicating digit count is added before a number. This prefix acts as a structured guide for the model, leading to improved numerical reasoning. For instance, the number "^{^{^{^{2^{^{^{^"}}}}}}} would be encoded as "{2:^{^{^{^{2^{^{^{^}",}}}}}}} where "2" indicates the digit count. This not only helps the model anticipate the scale of the number but also imposes a structured reasoning process akin to Chain of Thought (CoT).

Implementation involves text pre- and post-processing using regular expressions, ensuring compatibility without altering the fundamental model architecture. Special tokens such as "<sn>", "<mn>", and "<en>" denote the start, middle, and end of a numerical encoding.

Experiments and Results

Small Model Experiments

Utilizing NanoGPT, a small transformer model, the authors trained various arithmetic tasks including addition, subtraction, and multiplication. The method achieved substantial performance gains; for instance, arithmetic accuracy improved from 13.81% to 28.94% for multiplication tasks under NumeroLogic encoding.

Larger Model Experiments

For the Llama2-7B model, the scaling of NumeroLogic demonstrated significant enhancements in accuracy. Even for tasks typically saturated in performance, such as addition and subtraction, improvements were noted. Notably, subtraction tasks saw accuracy increases rectifying over 80% of previous errors.

Figure 2: MMLU Accuracy of Llama2-7B. Continuing self-supervised pretraining on web-curated text tokens, when numbers are encoded with NumeroLogic, helps improve the performance beyond the pretrained model or a model trained on the same text with plain numbers.

Self-Supervised Pretraining

The authors extended self-supervised pretraining on Llama2-7B using the RefinedWeb dataset. This demonstrated that NumeroLogic encoding during pretraining enhances zero-shot performance on the MMLU benchmark, with statistically significant accuracy improvements noted in STEM-related tasks compared to social sciences and humanities.

Ablation Studies

Ablation experiments highlighted the impact of encoding operands vs. results separately. While both were beneficial, encoding the results demonstrated stronger effects, emphasizing the importance of structured reasoning facilitated by NumeroLogic. Different encoding alternatives were explored, confirming that retaining both prefix and suffix tokens yields the best accuracy improvements.

Conclusion

NumeroLogic introduces a practical solution for enhancing LLMs' numerical reasoning through a straightforward reformatting technique compatible with existing architectures. This format leverages the intrinsic scale of numbers, providing LLMs with a preemptive reasoning framework that improves comprehension and generation accuracy across diverse numerical tasks. Ultimately, NumeroLogic contributes to refining general language modeling capabilities within AI, particularly in domains requiring robust numerical understanding.

The paper exemplifies an impactful advancement without necessitating model alterations, ensuring broad applicability across different model architectures. As LLMs continue to integrate into numerous applications, techniques like NumeroLogic are crucial for bridging gaps in cognitive and computational interoperability.

Markdown Report Issue