Large Language Models Meet Legal Artificial Intelligence: A Survey

Published 12 Sep 2025 in cs.CL and cs.AI | (2509.09969v1)

Abstract: LLMs have significantly advanced the development of Legal Artificial Intelligence (Legal AI) in recent years, enhancing the efficiency and accuracy of legal tasks. To advance research and applications of LLM-based approaches in legal domain, this paper provides a comprehensive review of 16 legal LLMs series and 47 LLM-based frameworks for legal tasks, and also gather 15 benchmarks and 29 datasets to evaluate different legal capabilities. Additionally, we analyse the challenges and discuss future directions for LLM-based approaches in the legal domain. We hope this paper provides a systematic introduction for beginners and encourages future research in this field. Resources are available at https://github.com/ZhitianHou/LLMs4LegalAI.

Abstract PDF Upgrade to Chat

Summary

The paper presents a comprehensive review of 16 legal LLM series and 47 task-specific frameworks, delineating their applications in legal judgment prediction and case summarization.
The paper evaluates 15 benchmarks and 29 datasets, highlighting challenges such as data bias, hallucination, and the need for improved multilingual models.
The paper outlines future research directions including the integration of Retrieval-Augmented Generation for enhanced interpretability and efficient, domain-specific legal AI models.

LLMs Meet Legal Artificial Intelligence: A Survey

Introduction

The paper "LLMs Meet Legal Artificial Intelligence: A Survey" (2509.09969) provides an extensive review of the convergence of LLMs and Legal AI, highlighting the significant advancements LLMs have brought to the legal domain. The survey encompasses an analysis of 16 legal LLM series and 47 frameworks utilizing LLMs for legal tasks, along with the introduction of 15 benchmarks and 29 datasets used for evaluating diverse legal capabilities.

Legal LLMs and Frameworks

The adoption of LLMs in the legal sector targets a myriad of tasks, including legal judgment prediction, case retrieval, and summarization, as illustrated in Figure 1. The exploration of different approaches within this domain underscores the dual approaches of fine-tuning new legal-specific LLMs and utilizing existing LLMs within task-specific frameworks.

Figure 1: An example of LLMs in legal judgement prediction task. (a) is a pipeline of fine-tuning a new legal LLM. (b) and (c) are LLM-based frameworks for the task. (b) utilizes legal syllogism within the prompt. (c) uses a system joined by LLM and Domain Model.

The survey categorizes various datasets and methods into types of tasks they address and evaluates the impact of LLM-based techniques. Notably, LLMs are trained on substantial legal corpora to infuse domain-specific knowledge, overcoming the limitations of general-purpose LLMs when applied directly to legal languages.

Datasets and Benchmarks

A significant portion of the paper is dedicated to annotating the current landscape of legal AI datasets, particularly those tailored to LLM-specific applications. The datasets are scrutinized based on pre-training sources, supervised fine-tuning scopes, and benchmark originations. The survey highlights datasets such as "HanFei" and "LawGPT," which are foundational for subsequent model development. Bias, imbalance, and synthesis quality in datasets present ongoing challenges.

Figure 2: The capabilities assessed by benchmarks.

Benchmarking efforts have focused on establishing reliable evaluation metrics for LLMs across various legal tasks, as shown in Figure 2. Benchmarks such as LexGLUE and LegalBench reflect the nuanced requirements in evaluating legal reasoning and retrieval accuracy.

Challenges and Directions

Despite the progress, the paper identifies persistent challenges in deploying LLM approaches within legal contexts. These challenges include issues of hallucination in output reliability, the complexity of multilingual settings, and the need for interpretability in legal reasoning processes. The authors stress the importance of creating smaller, more efficient models that maintain competitive performance levels.

Future Research Directions

The exploration of future research avenues emphasizes the synthesis of high-quality datasets, enhancement of model interpretability, and integration of multimodal data. The authors advocate for leveraging Retrieval-Augmented Generation (RAG) to address interpretability hurdles and call for comprehensive multilingual models that bridge cross-jurisdictional legal applications.

Conclusion

This paper serves as a pivotal resource charting the intersection of LLMs and legal AI, offering a structured overview that benefits researchers and developers in the field. It proposes key paths for further exploration, particularly in areas requiring robust multilingual capabilities and reliable decision-making frameworks.

In sum, this survey uncovers the ongoing evolution in using LLMs to tackle complex legal challenges, pointing toward an intriguing future where AI augments legal processes effectively.

Markdown Report Issue