Papers
Topics
Authors
Recent
Search
2000 character limit reached

HAMLETJudge: Logic and Legal AI

Updated 23 December 2025
  • HAMLETJudge is a multidisciplinary framework that merges logic puzzles with legal AI, employing a unique 'guilt-question' model to inform judgment prediction.
  • It utilizes a trichotomous dogmatics approach—covering offense elements, unlawfulness, and culpability—via both zero-shot structured prompting and fine-tuning techniques.
  • The system leverages the LJPIV dataset and structured methodology to achieve robust in-domain and cross-domain performance in legal judgment prediction.

HAMLETJudge refers to multiple concepts in logic, computer science, and legal artificial intelligence. It originated as a logic puzzle model involving truth-tellers and liars under specialized "guilt-question" constraints, and has since informed formal frameworks in legal judgment prediction, most notably as a blueprint for interpretable AI judgment systems built upon trichotomous dogmatics.

1. The HAMLET Judge Puzzle: Formal Model and Solution

The HAMLET Judge puzzle is a variant of the "Who is Guilty?" family of logical puzzles. The agents (Hamlet and a Judge) are each either truth-tellers (TiT_i) or liars (LiL_i) and face questions, notably “Are you guilty?”. The twist is a special deviation rule:

  • Truth-tellers lie exactly on the guilt question if guilty.
  • Liars tell the truth on the guilt question if guilty, but otherwise lie.

Definitions:

  • TiT_i: agent ii is a truth-teller.
  • LiL_i: agent ii is a liar (Li¬TiL_i\leftrightarrow\neg T_i).
  • GiG_i: agent ii is guilty.
  • Ai(q)A_i(q): agent ii’s answer to question qq (\top = “Yes”, \bot = “No”).

Constraints:

  • For any qq\ne “Are you guilty?”:
    • Ti(Ai(q)φi(q))T_i \to (A_i(q)\leftrightarrow \varphi_i(q))
    • Li(Ai(q)¬φi(q))L_i \to (A_i(q)\leftrightarrow \neg\varphi_i(q))
  • For qgq_g = “Are you guilty?”:
    • Ti(Ai(qg)¬Gi)T_i \to (A_i(q_g)\leftrightarrow \neg G_i)
    • Li(Ai(qg)Gi)L_i \to (A_i(q_g)\leftrightarrow G_i)

The system is solved by translating observed answers and the unique "guilt" global constraint (e.g., exactly one guilty agent) into propositional logic equations. For the canonical instance, all answers are "No" and one agent is guilty, leading to a unique configurational solution:

  • Hamlet is guilty and a truth-teller; the Judge is innocent and a liar (Chen et al., 2016).

2. Trichotomous Dogmatics in the Design of HAMLETJudge

HAMLETJudge, as a legal AI system, is constructed to explicitly mirror the trichotomous dogmatics of criminal law, notably used in civil-law jurisdictions (China, Germany, Japan). The trichotomous structure consists of three stages:

  1. Offense Elements (Tatbestand): Whether the facts fulfill the statutory elements of a crime.
  2. Unlawfulness (Rechtswidrigkeit): Whether justifications (e.g., self-defense, necessity) apply.
  3. Culpability (Schuld): Whether the accused has criminal responsibility (age, mental state).

In HAMLETJudge, each stage is implemented as a separate sub-model or zero-shot prompt call, consuming specifically extracted case facts and returning a structured output:

  • y1y_1: charge label or “无罪” (innocent).
  • y2y_2: binary “是/否” for justification.
  • y3y_3: binary “是/否” for culpability.
  • Decision logic: verdict = “无罪” if y2y_2 and y3y_3 both return “否”, otherwise y1y_1 (Zhang et al., 2024).

3. Dataset Construction: LJPIV

Central to robust trichotomous legal reasoning is the LJPIV dataset—Legal Judgment Prediction with Innocent Verdicts—engineered to supply both guilty and innocent cases, annotated at each trichotomous level. The pipeline:

  • Extracts reasoning-relevant sentences using a fine-tuned LLM extractor.
  • Randomly augments 50% of cases to inject exonerating facts at the element, unlawfulness, or culpability level using retrieval-augmented generation and structured prompting.
  • Applied iterative human and LLM-based quality control to ensure logical and legal correctness.
  • Produces a roughly 1:1 balance of guilty and innocent samples, tracking the grounds for innocence and supporting multi-jurisdictional expansion (Zhang et al., 2024).

Key statistics:

Subset #Cases #Charges Guilty:Innocent Ratio Innocence Reason Ratio (Elements:Unlawfulness:Culpability)
LJPIV-CAIL 1,680 112 1:1 ~3:1:1
LJPIV-ELAM 500 63 1:1 ~3:1:1
LJPIV-LeCaRD 80 30 1:1 ~3:1:1

4. Modeling Strategies: Prompting and Fine-Tuning

HAMLETJudge supports two main modeling paradigms:

  1. Zero-Shot Structured Prompting:
    • Sequentially prompt an LLM at each trichotomous stage, combining outputs according to legal logic.
    • Prompt templates p1p_1, p2p_2, and p3p_3 are precisely crafted to solicit charge, justification, and culpability.
    • Variants include direct zero-shot and chain-of-thought (CoT) prompting, as well as a bespoke "Zero-shot-Tri" setting—structured three-step prompting that better aligns with legal reasoning.
  2. Fine-Tuning via LoRA:
    • Augment LLMs (Qwen2-7B-Instruct, Baichuan2-7B-Chat) to map extracted facts and prompts to appropriate charge/innocence labels using low-rank adaptation.
    • Employs special tokens to indicate output structure and cross-entropy loss as the learning objective.
    • Empirically, fine-tuning with trichotomous structured prompting delivers the highest in-domain and markedly improved cross-domain performance (Zhang et al., 2024).

5. Evaluation Metrics and Empirical Findings

Evaluation employs standard classification metrics—accuracy, precision, recall, F1F_1—as well as cross-entropy loss for training. Comparative results establish current limitations and best practices:

  • Legal-domain LLMs without trichotomous logic yield F1<0.30F_1<0.30 and have negligible capacity to predict innocence.
  • Open LLMs using "Zero-shot-Tri" prompts achieve F1F_1 up to 32.7%32.7\% (Qwen2).
  • Direct fine-tuning alone: F16061%F_1\approx 60-61\% in-domain, 810%8-10\% cross-domain.
  • Fine-tuning + structured prompting: F167%F_1\approx 67\% in-domain, 2023%20-23\% cross-domain.

Notably, the model’s ability to capture exonerations was broken down by reasoning level:

  • Elements-based innocence: 75%\sim75\% accuracy.
  • Unlawfulness: 88%\sim88\%.
  • Culpability: 92%\sim92\%.

Ablation shows that omitting unlawfulness or culpability dramatically decreases F1F_1 (by $8-10$ points per level), underscoring the necessity of full trichotomous reasoning (Zhang et al., 2024).

6. System Integration, Maintenance, and Prospects

HAMLETJudge architecture encompasses pre-processing (sentence extraction), three-stage trichotomous reasoning, and post-processing (interpretable verdict rationales with statute citation). Key maintenance strategies include:

  • Balanced expansion of LJPIV with novel charges.
  • Manual review and multi-round random sampling to counter LLM over-modification.
  • Extensions to other legal systems by building analogous trich datasets.

Future refinements propose integrating precedent retrieval, uncertainty scoring, and expanding trichotomous logic to additional legal factors (e.g., sentencing) (Zhang et al., 2024).

The HAMLET Judge model also bears on theoretical investigations of logic, AI, and law, as its formal basis—originally articulated in logic puzzles—centralizes edge-case exception handling and interpretability, properties that are increasingly recognized as crucial in explainable AI for high-stakes, human-centric domains (Chen et al., 2016).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HAMLETJudge.