Better Call GPT, Comparing Large Language Models Against Lawyers

Published 24 Jan 2024 in cs.CY and cs.CL | (2401.16212v1)

Abstract: This paper presents a groundbreaking comparison between LLMs and traditional legal contract reviewers, Junior Lawyers and Legal Process Outsourcers. We dissect whether LLMs can outperform humans in accuracy, speed, and cost efficiency during contract review. Our empirical analysis benchmarks LLMs against a ground truth set by Senior Lawyers, uncovering that advanced models match or exceed human accuracy in determining legal issues. In speed, LLMs complete reviews in mere seconds, eclipsing the hours required by their human counterparts. Cost wise, LLMs operate at a fraction of the price, offering a staggering 99.97 percent reduction in cost over traditional methods. These results are not just statistics, they signal a seismic shift in legal practice. LLMs stand poised to disrupt the legal industry, enhancing accessibility and efficiency of legal services. Our research asserts that the era of LLM dominance in legal contract review is upon us, challenging the status quo and calling for a reimagined future of legal workflows.

Abstract PDF HTML Upgrade to Chat

References (11)

Citations (11)

View on Semantic Scholar

Summary

The paper demonstrates that LLMs can match or exceed junior lawyers in identifying legal issues, achieving comparable F-scores.
The paper utilizes an experimental approach with ground truth benchmarks from senior lawyers to assess accuracy, speed, and cost across LLMs and human practitioners.
The paper highlights that LLMs review contracts in minutes and at significantly lower costs, suggesting a disruptive potential for traditional legal review services.

A Comparison of LLMs and Human Legal Practitioners in Contract Review

Introduction

The paper "Better Call GPT, Comparing LLMs Against Lawyers" presents an analysis of the performance of LLMs as compared to Junior Lawyers and Legal Process Outsourcers (LPOs) in contract review tasks. The study evaluates the accuracy, speed, and cost-efficiency of these models against traditional legal practices. By benchmarking LLMs using assessments from Senior Lawyers, the study offers a perspective on the evolving capabilities of AI in legal services.

Methodology

The research utilizes an experimental approach where Senior Lawyers establish a ground truth for legal issue identification within contracts. LLMs are then benchmarked against this ground truth. Specifically, the study focuses on:

Accuracy in determining legal issues.
Speed in reviewing contracts.
Cost-effectiveness compared to human reviewers.

The dataset comprised real-world procurement contracts, anonymized for confidentiality and drawn from legal environments in both the United States and New Zealand. The models tested include GPT-4, Claude, and PaLM2 among others, evaluated in context windows ranging from 16,000 to 128,000 tokens.

Benchmarking Results

Accuracy in Determining Legal Issues

The evaluation reveals that LLMs, particularly GPT4-1106, match or exceed the precision and recall rates of Junior Lawyers in determining legal issues. However, there is variability in models' ability to locate these issues within contract text accurately. For example, while LPOs achieved an F-score of 0.77 in locating legal issues, GPT4-32k lagged slightly behind with an F-score of 0.74.

Figure 1: Level of agreement on issues by role.

Time and Cost Efficiency

LLMs demonstrate significant advantages in processing speed and cost reduction. LLMs like GPT-4 process contracts in under 5 minutes, contrasting starkly with the hours required by Junior Lawyers and LPOs. Moreover, LLMs provide contract review services at a fraction of the cost, with reductions exceeding 99% compared to traditional methods. This suggests a potential for widespread adoption of AI in contract review tasks, particularly for high-volume or standardized agreements.

Discussion

The study underscores the readiness of LLMs to supplant traditional legal review methods, especially in high-volume tasks. However, it also highlights the importance of model selection based on specific legal tasks, as different models exhibit varying performance across dimensions of task accuracy, time, and costs.

The potential for LLMs to disrupt LPO models is significant, presenting a challenge to traditional legal services and opening pathways for new efficiencies and service models. Despite these advances, the caution surrounding the adoption of LLMs stems partly from the inherent variability in LLM's nuanced legal discourse interpretation, which remains a focus for future technological improvements and wider integration within the legal workflow.

Conclusion

The findings of this paper provide compelling evidence that LLMs can adequately perform tasks traditionally conducted by LPOs and Junior Lawyers, offering substantial efficiencies in time and cost. This positions LLMs as formidable tools in legal contract review processes, paving the way for enhanced operational efficiencies in legal practice. However, ongoing development and testing of these models will be essential to address current limitations and further integrate AI into the wider legal domain. The study encourages future work to extend the application of LLMs to more complex legal processes like contract negotiation.

Markdown Report Issue