2000 character limit reached
Classifying complex documents: comparing bespoke solutions to large language models
Published 12 Dec 2023 in cs.CL and cs.LG | (2312.07182v1)
Abstract: Here we search for the best automated classification approach for a set of complex legal documents. Our classification task is not trivial: our aim is to classify ca 30,000 public courthouse records from 12 states and 267 counties at two different levels using nine sub-categories. Specifically, we investigated whether a fine-tuned LLM can achieve the accuracy of a bespoke custom-trained model, and what is the amount of fine-tuning necessary.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.