From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

Published 11 Apr 2024 in cs.CL and cs.AI | (2404.07544v3)

Abstract: We analyze how well pre-trained LLMs (e.g., Llama2, GPT-4, Claude 3, etc) can do linear and non-linear regression when given in-context examples, without any additional training or gradient updates. Our findings reveal that several LLMs (e.g., GPT-4, Claude 3) are able to perform regression tasks with a performance rivaling (or even outperforming) that of traditional supervised methods such as Random Forest, Bagging, or Gradient Boosting. For example, on the challenging Friedman #2 regression dataset, Claude 3 outperforms many supervised methods such as AdaBoost, SVM, Random Forest, KNN, or Gradient Boosting. We then investigate how well the performance of LLMs scales with the number of in-context exemplars. We borrow from the notion of regret from online learning and empirically show that LLMs are capable of obtaining a sub-linear regret.

Abstract PDF HTML Upgrade to Chat

References (59)

Citations (16)

View on Semantic Scholar

Summary

The paper shows that LLMs can perform both linear and non-linear regression via in-context learning, often outperforming traditional regression methods.
The methodology employs synthetic regression datasets to compare LLM performance with traditional supervised and unsupervised techniques.
Results reveal that models like GPT-4 and Claude 3 improve prediction accuracy and reduce cumulative regret as more examples are provided.

Regression Capabilities of LLMs Through In-Context Examples

Introduction

The paper "From Words to Numbers: Your LLM Is Secretly A Capable Regressor When Given In-Context Examples" investigates the ability of pre-trained LLMs to perform both linear and non-linear regression tasks when provided with in-context examples. This exploration reveals that certain LLMs, such as GPT-4 and Claude 3, can surpass traditional supervised methods like Random Forest and Gradient Boosting in regression tasks, all without additional parameter updates or training.

Methodology

The study employs synthetic regression datasets to evaluate LLMs' regression capabilities, leveraging their deterministic and controllable characteristics. These datasets include simple linear regressions and more complex, non-linear problems like those from the Friedman benchmarks. The methodology involves presenting LLMs with a dataset of input-output examples and asking them to predict outcomes for new inputs. This in-context learning (ICL) is compared against traditional supervised models and unsupervised heuristics.

Results

Performance on Linear Regression

LLMs demonstrated strong performance in linear regression tasks, often outperforming unsupervised baselines and, in some cases, even supervised methods like Random Forest and Gradient Boosting. The performance of LLMs varied slightly with the complexity of the dataset, but overall, models such as Claude 3 and GPT-4 maintained competitive accuracy across different datasets.

Figure 1: The performance, as measured by the Mean Absolute Error ( $\downarrow$ ), across LLMs, traditional supervised models, and unsupervised models on random regression tasks.

Performance on Non-Linear Regression

In non-linear regression tasks, LLMs continued to outperform unsupervised methods and demonstrated capabilities that challenge traditional models. Notably, Claude 3 outperformed other methods on complex non-linear datasets, reflecting the sophisticated underlying mechanisms in LLMs.

Figure 2: The rank of each method over nonlinear regression datasets, where higher ranks indicate better performance.

Cumulative Learning and Regret Analysis

A key finding is that LLMs can improve their performance as more in-context examples are provided. This capability is illustrated through cumulative regret analysis, where LLMs like GPT-4 showcase sub-linear growth in regret, improving predictions over time and indicating effective use of additional data.

Figure 3: The cumulative regret of two LLMs on different non-linear regression datasets, showing sub-linear growth.

Discussion

The investigation uncovers that the efficacy of LLMs in regression tasks appears to stem from sophisticated in-context learning mechanisms rather than traditional training or parameter tuning. This capability encompasses both simple and complex regression tasks, including symbolic tasks beyond numerical data setups. The results suggest that even without explicit training for regression, LLMs can leverage in-context examples to achieve competitive, if not superior, performance over traditional regression methods.

Conclusion

The study reveals that LLMs possess intrinsic capabilities for regression, facilitated by in-context learning, which enables them to approximate or exceed the effectiveness of conventional regression algorithms. The broader implications suggest potential applications in various domains where quick adaptation to new data is required, encouraging future exploration of leveraging LLMs for practical regression scenarios without explicit task-specific training.

These findings challenge traditional assumptions about LLMs' abilities outside linguistic domains, opening new avenues for their utilization in fields demanding dynamic and adaptable machine learning models.