Case Law Grounding: Using Precedents to Align Decision-Making for Humans and AI
Abstract: From moderating content within an online community to producing socially-appropriate generative outputs, decision-making tasks -- conducted by either humans or AI -- often depend on subjective or socially-established criteria. To ensure such decisions are consistent, prevailing processes primarily make use of high-level rules and guidelines to ground decisions, similar to applying "constitutions" in the legal context. However, inconsistencies in specifying and interpreting constitutional grounding can lead to undesirable and even incorrect decisions being made. In this work, we introduce "case law grounding" (CLG) -- an approach for grounding subjective decision-making using past decisions, similar to how precedents are used in case law. We present how this grounding approach can be implemented in both human and AI decision-making contexts, introducing both a human-led process and a LLM prompting setup. Evaluating with five groups and communities across two decision-making task domains, we find that decisions produced with CLG were significantly more accurately aligned to ground truth in 4 out of 5 groups, achieving a 16.0--23.3 %-points higher accuracy in the human process, and 20.8--32.9 %-points higher with LLMs. We also examined the impact of different configurations with the retrieval window size and binding nature of decisions and find that binding decisions and larger retrieval windows were beneficial. Finally, we discuss the broader implications of using CLG to augment existing constitutional grounding when it comes to aligning human and AI decisions.
- Agnar Aamodt and Enric Plaza. 1994. Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches. AI Commun. 7 (1994), 39–59. https://api.semanticscholar.org/CorpusID:7069926
- Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073 (2022).
- Robert C Berring. 1987. Legal research and legal concepts: where form molds substance. California Law Review 75, 1 (1987), 15–27.
- Use of expert panels to define the reference standard in diagnostic research: a systematic review of published methods and reporting. PLoS medicine 10, 10 (2013), e1001531.
- Investigating differences in crowdsourced news credibility assessment: Raters, tasks, and expert criteria. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (2020), 1–26.
- On the Opportunities and Risks of Foundation Models. ArXiv abs/2108.07258 (2021). https://api.semanticscholar.org/CorpusID:237091588
- Learning on the job: Optimal instruction for crowdsourcing. In ICML Workshop on Crowdsourcing and Machine Learning.
- Jonathan Bragg and Daniel S Weld. 2018. Sprout: Crowd-powered task design for crowdsourcing. In Proceedings of the 31st annual acm symposium on user interface software and technology. 165–176.
- Language Models are Few-Shot Learners. ArXiv abs/2005.14165 (2020). https://api.semanticscholar.org/CorpusID:218971783
- Don’t look now, but we’ve created a bureaucracy: the nature and roles of policies and rules in wikipedia. In Proceedings of the SIGCHI conference on human factors in computing systems. 1101–1110.
- Charles E Carpenter. 1917. Court Decisions and the Common Law. Columbia Law Review 17, 7 (1917), 593–607.
- Eshwar Chandrasekharan and Eric Gilbert. 2019. Hybrid approaches to detect comments violating macro norms on reddit. arXiv preprint arXiv:1904.03596 (2019).
- The Internet’s hidden rules: An empirical study of Reddit norm violations at micro, meso, and macro scales. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1–25.
- The Internet’s Hidden Rules: An Empirical Study of Reddit Norm Violations at Micro, Meso, and Macro Scales. Proc. ACM Hum.-Comput. Interact. 2, CSCW, Article 32 (nov 2018), 25 pages. https://doi.org/10.1145/3274301
- Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI ’17). Association for Computing Machinery, New York, NY, USA, 2334–2346. https://doi.org/10.1145/3025453.3026044
- Cicero: Multi-turn, contextual argumentation for accurate crowdsourcing. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–14.
- Goldilocks: Consistent crowdsourced scalar annotations with relative uncertainty. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–25.
- Susan Craw and Agnar Aamodt. 2018. Case based reasoning as a model for cognitive artificial intelligence. In Case-Based Reasoning Research and Development: 26th International Conference, ICCBR 2018, Stockholm, Sweden, July 9-12, 2018, Proceedings 26. Springer, 62–77.
- Whose ground truth? accounting for individual and collective identities underlying dataset annotation. arXiv preprint arXiv:2112.04554 (2021).
- Microtalk: Using argumentation to improve crowdsourcing accuracy. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 4. 32–41.
- Jenny Fan and Amy X Zhang. 2020. Digital juries: A civics-oriented approach to platform governance. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–14.
- Mehdi Farashahi and Mahdi Tajeddin. 2018. Effectiveness of teaching methods in business education: A comparison study on the learning outcomes of lectures, case studies and simulations. The International Journal of Management Education 16 (2018), 131–142. https://api.semanticscholar.org/CorpusID:102489428
- Reddit rules! characterizing an ecosystem of governance. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 12.
- Toxic, hateful, offensive or abusive? what are we really classifying? an empirical analysis of hate speech datasets. In Proceedings of the 12th language resources and evaluation conference. 6786–6794.
- The capacity for moral self-correction in large language models. arXiv preprint arXiv:2302.07459 (2023).
- John T Gaubatz. 1981. Moot court in the modern law school. J. Legal Educ. 31 (1981), 87.
- Jury Learning: Integrating Dissenting Voices into Machine Learning Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 115, 19 pages. https://doi.org/10.1145/3491102.3502004
- The disagreement deconvolution: Bringing machine learning performance metrics in line with reality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–14.
- The Guardian. 2017. The Facebook Files. https://www.theguardian.com/news/series/facebook-files.
- Aaron L Halfaker and R. Stuart Geiger. 2019. ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia. ArXiv abs/1909.05189 (2019).
- Aligning AI with shared human values. arXiv preprint arXiv:2008.02275 (2020).
- The Challenge of Variable Effort Crowdsourcing and How Visible Gold Can Help. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–26.
- Can Online Juries Make Consistent, Repeatable Decisions?. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–16.
- Sohyeon Hwang and Aaron Shaw. 2022. Rules and Rule-Making in the Five Largest Wikipedias. Proceedings of the International AAAI Conference on Web and Social Media 16, 1 (May 2022), 347–357. https://doi.org/10.1609/icwsm.v16i1.19297
- Understanding international perceptions of the severity of harmful content online. PloS one 16, 8 (2021), e0256762.
- Nan-Jiang Jiang and Marie-Catherine de Marneffe. 2022. Investigating Reasons for Disagreement in Natural Language Inference. arXiv preprint arXiv:2209.03392 (2022).
- Robert A Kagan. 2019. Adversarial Legalism: The American Way of Law. Harvard University Press.
- Kate Klonick. 2017. The new governors: The people, rules, and processes governing online speech. Harv. L. Rev. 131 (2017), 1598.
- Kate Klonick. 2020. The Facebook Oversight Board: Creating an independent institution to adjudicate online free expression. Yale Law Journal 129, 2418 (2020).
- Measuring User-Moderator Alignment on r/ChangeMyView. Proceedings of the ACM on Human-Computer Interaction 7 (2023), 1 – 36. https://api.semanticscholar.org/CorpusID:263621128
- Structured Labeling for Facilitating Concept Evolution in Machine Learning (CHI ’14). Association for Computing Machinery, New York, NY, USA, 3075–3084. https://doi.org/10.1145/2556288.2557238
- Designing toxic content classification for a diversity of perspectives. In Seventeenth Symposium on Usable Privacy and Security (SOUPS 2021). 299–318.
- Annotation Curricula to Implicitly Train Non-Expert Annotators. Computational Linguistics 48, 2 (2022), 343–373.
- Effective crowd annotation for relation extraction. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. 897–906.
- Yang Liu and Yi-Fang Wu. 2018. Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
- J Nathan Matias. 2019. The civic labor of volunteer moderators online. Social Media+ Society 5, 2 (2019), 2056305119836778.
- Teaching ethics in the clinic. The theory and practice of moral case deliberation. Journal of Medical Ethics 34 (2008), 120 – 124. https://api.semanticscholar.org/CorpusID:19342927
- Elle O’Brien. 2020. iterative/aita_dataset: Praw rescrape of entire dataset. https://doi.org/10.5281/zenodo.3677563
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
- Comparing the Perceived Legitimacy of Content Moderation Processes: Contractors, Algorithms, Expert Panels, and Digital Juries. Proc. ACM Hum.-Comput. Interact. 6, CSCW1, Article 82 (apr 2022), 31 pages. https://doi.org/10.1145/3512929
- Detecting community sensitive norm violations in online conversations. arXiv preprint arXiv:2110.04419 (2021).
- Norbert Paulo. 2015. Casuistry as common law morality. Theoretical Medicine and Bioethics 36, 6 (2015), 373–389.
- Jessica Pierce. 2013. Morality play: Case studies in ethics. Waveland Press.
- In Search of Ambiguity: A Three-Stage Workflow Design to Clarify Annotation Guidelines for Crowd Workers. Frontiers in Artificial Intelligence 5 (2022).
- Predicting worker disagreement for more effective crowd labeling. In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 179–188.
- Lewis S Ringel. 2004. Designing a moot court: What to do, what not to do, and suggestions for how to do it. PS: Political Science & Politics 37, 3 (2004), 459–465.
- Sarah T Roberts. 2014. Behind the screen: The hidden digital labor of commercial content moderation. University of Illinois at Urbana-Champaign.
- Measuring the reliability of hate speech annotations: The case of the european refugee crisis. arXiv preprint arXiv:1701.08118 (2017).
- Understanding expert disagreement in medical data analysis through structured adjudication. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–23.
- ” I Hope This Is Helpful” Understanding Crowdworkers’ Challenges and Motivations for an Image Description Task. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (2020), 1–26.
- Near-optimally teaching the crowd to classify. In International Conference on Machine Learning. PMLR, 154–162.
- Darren Sush and Adel C Najdowski. 2021. A workbook of ethical case scenarios in applied behavior analysis. Academic Press.
- Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html 3, 6 (2023), 7.
- Challenges for toxic comment classification: An in-depth error analysis. arXiv preprint arXiv:1809.07572 (2018).
- CROWD-IN-THE-LOOP: A Hybrid Approach for Annotating Semantic Roles. In EMNLP.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
- Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359 (2021).
- Ethical and social risks of harm from Language Models. ArXiv abs/2112.04359 (2021). https://api.semanticscholar.org/CorpusID:244954639
- Taxonomy of risks posed by language models. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 214–229.
- Making Online Communities’ Better’: A Taxonomy of Community Values on Reddit. arXiv preprint arXiv:2109.05152 (2021).
- On-the-job learning with bayesian decision theory. Advances in Neural Information Processing Systems 28 (2015).
- Meng-Han Wu and Alexander J. Quinn. 2017. Confusing the Crowd: Task Instruction Quality on Amazon Mechanical Turk. In HCOMP.
- Ex machina: Personal attacks seen at scale. In Proceedings of the 26th international conference on world wide web. 1391–1399.
- Almost an Expert: The Effects of Rubrics and Expertise on Perceived Value of Crowdsourced Design Critiques. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW ’16). Association for Computing Machinery, New York, NY, USA, 1005–1017. https://doi.org/10.1145/2818048.2819953
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.