HAMLETJudge: Logic and Legal AI
- HAMLETJudge is a multidisciplinary framework that merges logic puzzles with legal AI, employing a unique 'guilt-question' model to inform judgment prediction.
- It utilizes a trichotomous dogmatics approach—covering offense elements, unlawfulness, and culpability—via both zero-shot structured prompting and fine-tuning techniques.
- The system leverages the LJPIV dataset and structured methodology to achieve robust in-domain and cross-domain performance in legal judgment prediction.
HAMLETJudge refers to multiple concepts in logic, computer science, and legal artificial intelligence. It originated as a logic puzzle model involving truth-tellers and liars under specialized "guilt-question" constraints, and has since informed formal frameworks in legal judgment prediction, most notably as a blueprint for interpretable AI judgment systems built upon trichotomous dogmatics.
1. The HAMLET Judge Puzzle: Formal Model and Solution
The HAMLET Judge puzzle is a variant of the "Who is Guilty?" family of logical puzzles. The agents (Hamlet and a Judge) are each either truth-tellers () or liars () and face questions, notably “Are you guilty?”. The twist is a special deviation rule:
- Truth-tellers lie exactly on the guilt question if guilty.
- Liars tell the truth on the guilt question if guilty, but otherwise lie.
Definitions:
- : agent is a truth-teller.
- : agent is a liar ().
- : agent is guilty.
- : agent ’s answer to question ( = “Yes”, = “No”).
Constraints:
- For any “Are you guilty?”:
- For = “Are you guilty?”:
The system is solved by translating observed answers and the unique "guilt" global constraint (e.g., exactly one guilty agent) into propositional logic equations. For the canonical instance, all answers are "No" and one agent is guilty, leading to a unique configurational solution:
- Hamlet is guilty and a truth-teller; the Judge is innocent and a liar (Chen et al., 2016).
2. Trichotomous Dogmatics in the Design of HAMLETJudge
HAMLETJudge, as a legal AI system, is constructed to explicitly mirror the trichotomous dogmatics of criminal law, notably used in civil-law jurisdictions (China, Germany, Japan). The trichotomous structure consists of three stages:
- Offense Elements (Tatbestand): Whether the facts fulfill the statutory elements of a crime.
- Unlawfulness (Rechtswidrigkeit): Whether justifications (e.g., self-defense, necessity) apply.
- Culpability (Schuld): Whether the accused has criminal responsibility (age, mental state).
In HAMLETJudge, each stage is implemented as a separate sub-model or zero-shot prompt call, consuming specifically extracted case facts and returning a structured output:
- : charge label or “无罪” (innocent).
- : binary “是/否” for justification.
- : binary “是/否” for culpability.
- Decision logic: verdict = “无罪” if and both return “否”, otherwise (Zhang et al., 2024).
3. Dataset Construction: LJPIV
Central to robust trichotomous legal reasoning is the LJPIV dataset—Legal Judgment Prediction with Innocent Verdicts—engineered to supply both guilty and innocent cases, annotated at each trichotomous level. The pipeline:
- Extracts reasoning-relevant sentences using a fine-tuned LLM extractor.
- Randomly augments 50% of cases to inject exonerating facts at the element, unlawfulness, or culpability level using retrieval-augmented generation and structured prompting.
- Applied iterative human and LLM-based quality control to ensure logical and legal correctness.
- Produces a roughly 1:1 balance of guilty and innocent samples, tracking the grounds for innocence and supporting multi-jurisdictional expansion (Zhang et al., 2024).
Key statistics:
| Subset | #Cases | #Charges | Guilty:Innocent Ratio | Innocence Reason Ratio (Elements:Unlawfulness:Culpability) |
|---|---|---|---|---|
| LJPIV-CAIL | 1,680 | 112 | 1:1 | ~3:1:1 |
| LJPIV-ELAM | 500 | 63 | 1:1 | ~3:1:1 |
| LJPIV-LeCaRD | 80 | 30 | 1:1 | ~3:1:1 |
4. Modeling Strategies: Prompting and Fine-Tuning
HAMLETJudge supports two main modeling paradigms:
- Zero-Shot Structured Prompting:
- Sequentially prompt an LLM at each trichotomous stage, combining outputs according to legal logic.
- Prompt templates , , and are precisely crafted to solicit charge, justification, and culpability.
- Variants include direct zero-shot and chain-of-thought (CoT) prompting, as well as a bespoke "Zero-shot-Tri" setting—structured three-step prompting that better aligns with legal reasoning.
- Fine-Tuning via LoRA:
- Augment LLMs (Qwen2-7B-Instruct, Baichuan2-7B-Chat) to map extracted facts and prompts to appropriate charge/innocence labels using low-rank adaptation.
- Employs special tokens to indicate output structure and cross-entropy loss as the learning objective.
- Empirically, fine-tuning with trichotomous structured prompting delivers the highest in-domain and markedly improved cross-domain performance (Zhang et al., 2024).
5. Evaluation Metrics and Empirical Findings
Evaluation employs standard classification metrics—accuracy, precision, recall, —as well as cross-entropy loss for training. Comparative results establish current limitations and best practices:
- Legal-domain LLMs without trichotomous logic yield and have negligible capacity to predict innocence.
- Open LLMs using "Zero-shot-Tri" prompts achieve up to (Qwen2).
- Direct fine-tuning alone: in-domain, cross-domain.
- Fine-tuning + structured prompting: in-domain, cross-domain.
Notably, the model’s ability to capture exonerations was broken down by reasoning level:
- Elements-based innocence: accuracy.
- Unlawfulness: .
- Culpability: .
Ablation shows that omitting unlawfulness or culpability dramatically decreases (by $8-10$ points per level), underscoring the necessity of full trichotomous reasoning (Zhang et al., 2024).
6. System Integration, Maintenance, and Prospects
HAMLETJudge architecture encompasses pre-processing (sentence extraction), three-stage trichotomous reasoning, and post-processing (interpretable verdict rationales with statute citation). Key maintenance strategies include:
- Balanced expansion of LJPIV with novel charges.
- Manual review and multi-round random sampling to counter LLM over-modification.
- Extensions to other legal systems by building analogous trich datasets.
Future refinements propose integrating precedent retrieval, uncertainty scoring, and expanding trichotomous logic to additional legal factors (e.g., sentencing) (Zhang et al., 2024).
The HAMLET Judge model also bears on theoretical investigations of logic, AI, and law, as its formal basis—originally articulated in logic puzzles—centralizes edge-case exception handling and interpretability, properties that are increasingly recognized as crucial in explainable AI for high-stakes, human-centric domains (Chen et al., 2016).