- The paper demonstrates that LLMs can automate data extraction from legal documents to audit biases in jury selection and eviction cases.
- Experiments reveal varying accuracies, with challenges in handling complex tasks such as understanding juror demographics and handwritten notes.
- Findings call for significant technical and legal investments to standardize legal data and enhance LLM performance.
Automating Transparency Mechanisms in the Judicial System Using LLMs
The paper "Automating Transparency Mechanisms in the Judicial System Using LLMs: Opportunities and Challenges" (2408.08477) explores the potential and limitations of employing LLMs to enhance transparency in the judicial system. The authors focus on automating the extraction of information from unstructured legal documents to facilitate auditing for biases and errors in jury selection and housing eviction cases. The study highlights the challenges in accessing and processing legal data and assesses LLM performance on specific information extraction tasks, emphasizing the need for both technical and legal investments to realize the potential of automated transparency mechanisms.
Background and Motivation
The judicial system is often scrutinized for structural biases that exacerbate social inequalities. Manual audits by journalists and researchers are essential for uncovering these biases, but they are resource-intensive and time-consuming. LLMs offer a promising avenue to automate and scale these transparency efforts by extracting key information from legal documents. The paper addresses the current gap in leveraging LLMs for transparency, distinguishing itself from prior work that primarily focuses on automating tasks for legal professionals. The authors aim to demonstrate the opportunities and challenges of using LLMs for transparency in jury selection and housing eviction processes.
Case Studies and Document Extraction Tasks
The paper presents two case studies: jury selection in criminal trials and housing eviction cases. Both areas are known for potential biases and exploitative practices.
Jury Selection
Transparency in jury selection requires analyzing court transcripts and jury strike sheets. The authors outline several document extraction tasks:
- Juror Demographic Information: name, race, gender, and occupation history.
- Trial Information: county, judge, attorneys, offense, and case verdict.
- Voir Dire Responses: reasons jurors are unable to be impartial.
- Selected Jurors: whether each prospective juror was selected or struck.
- Batson Challenges: whether a challenge claim was made and by whom.
Eviction
Transparency in eviction processes requires analyzing various court documents to uncover exploitative practices. Key document extraction tasks include:
- Case Background: address, tenancy details, landlord type, and legal representation.
- Procedural History of the Case: tenant defaults, executions issued, and case dispositions.
- Settlement Terms: specific settlement conditions and judgments.
LLM Capabilities and Experimental Setup
The study identifies essential LLM capabilities for document extraction:
- Synthesis: Integrating information from multiple documents or sections.
- Inference: Deriving logical or legal conclusions from the extracted data.
- Non-Categorical Query: Handling queries that do not require specific categorical outputs.
- Handwritten Information: Processing and interpreting handwritten annotations within documents.
The authors conducted experiments using OpenAI's GPT-4 Turbo model (gpt-4-turbo-2024-04-09) and gpt-3.5-turbo-0125 for fine-tuning. The experiments involved zero-shot prompting and evaluated LLM performance on specific tasks within each case study.
Results and Challenges
The results reveal varying LLM performance across different tasks, with accuracy generally decreasing as task complexity increases.
Jury Selection
Eviction
The study explored few-shot prompting, reducing document length, and fine-tuning to improve performance on jury selection tasks. Two-shot prompting significantly improved the Batson challenges task, increasing accuracy from 23.2% to 76.8%. Limiting the input to final jury roll call excerpts improved jury gender composition accuracy. Fine-tuning further enhanced performance, reducing absolute error.
Downstream Impact Tests
The authors highlighted the importance of measuring model performance in the context of downstream auditing questions. Using LLM outputs to determine jury gender composition altered the outcomes of potential audits, affecting the ranking of counties and prosecutors with the most female bias in jury selection.
Technical and Legal Investments
The paper underscores the need for significant technical and legal investments to facilitate the use of LLMs for legal auditing.
Technical Investments
- Re-Orienting Benchmarks: Developing benchmarks that align with real-world impact.
- Training Datasets: Expanding training on unstructured legal data.
- Pre-Processing Capabilities: Improving OCR tools for handwritten information and methods for identifying relevant document sections.
Legal Investments
- Data Accessibility and Standardization: Mandating standard document formats and digital databases.
- Model End-Users: Collaborating with legal experts and journalists to address hesitations in adopting LLMs.
- Mitigating Disparate Impacts: Addressing potential biases in model performance across different jurisdictions and communities.
Figure 3: Example strike sheets showing the variance in note-taking that occurs to document juror demographics and strike status. Common demarcations include 'W'/'B' for race, 'F'/'M' for gender, SX/DX for state and defense strikes, and 'C' for for-cause strikes.
Figure 4: Example Summary Process Summons and Complaint issued by the landlord to call the tenant to court and inform them of the grounds of eviction.
Figure 5: Example docket entry page including the final disposition (Agreement for Judgement) of an eviction case. The variability in handwriting and format of this page makes it difficult to automatically extract information.
Conclusion
The paper provides valuable insights into the opportunities and challenges of using LLMs to automate transparency mechanisms in the judicial system. The authors demonstrate that while LLMs have the potential to assist in information extraction from legal documents, their performance is highly dependent on task complexity and data quality. The study emphasizes the need for targeted technical and legal investments to ensure that LLMs can effectively contribute to transparency and accountability in the judicial system.