Confidence-weighted integration of human and machine judgments for superior decision-making

Published 15 Aug 2024 in cs.HC, cs.AI, and q-bio.NC | (2408.08083v3)

Abstract: LLMs can surpass humans in certain forecasting tasks. What role does this leave for humans in the overall decision process? One possibility is that humans, despite performing worse than LLMs, can still add value when teamed with them. A human and machine team can surpass each individual teammate when team members' confidence is well-calibrated and team members diverge in which tasks they find difficult (i.e., calibration and diversity are needed). We simplified and extended a Bayesian approach to combining judgments using a logistic regression framework that integrates confidence-weighted judgments for any number of team members. Using this straightforward method, we demonstrated its effectiveness in both image classification and neuroscience forecasting tasks. Combining human judgments with one or more machines consistently improved overall team performance. Our hope is that this simple and effective strategy for integrating the judgments of humans and machines will lead to productive collaborations.

Abstract PDF HTML Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper presents a logistic regression approach that weights human and machine judgments by confidence for enhanced decision-making.
Experiments in image classification and neuroscience forecasting reveal that integrated teams outperform machine-only predictions.
The findings underscore the importance of confidence calibration and error diversity in achieving superior human-machine collaboration.

Confidence-weighted Integration of Human and Machine Judgments for Superior Decision-Making

Introduction

The increasing sophistication of LLMs and other machine learning systems challenges the role of human judgment, particularly in forecasting and decision-making tasks. However, combining human and machine insights might enhance overall decision-making performance, even when machines exhibit superior standalone capabilities. This paper proposes a logistic regression framework that efficiently integrates human and machine judgments, weighted by confidence, to achieve complementarity. The study demonstrates the effectiveness of this approach in object recognition and neuroscience forecasting tasks.

Methodology

Framework and Implementation

The paper introduces a logistic regression-based method for merging the judgments of any number of team members, both humans and machines. The approach weights each member's prediction based on confidence, allowing more accurate and confident judgments to be more influential. This technique is grounded in the principles of Bayesian combination models but is computationally less demanding and more straightforward to extend for multiple teammates.

Experimental Settings

The evaluation utilized two tasks: a noisy image classification challenge and BrainBench, a neuroscience results prediction task. For image classification, various machine learning models pretrained on ImageNet and human participants classified images with varying noise levels. For BrainBench, the task involved distinguishing between correct and altered scientific abstracts, with participants being both human experts and LLMs from the Llama series.

Results and Analysis

Object Recognition Task

The results indicate that human-machine teams outperform machine-only teams, even when weighted confidence is omitted. This finding suggests that diversity in error profiles between humans and machines, rather than just confidence weighting, enhances collaboration. Specifically, human inclusion consistently improved performance when combined with one or more machine classifiers, signifying the utility of human-machine team integration in environments with overlapping and diverging task difficulties.

Neuroscience Forecasting Task

In the BrainBench task, the confidence-weighted logistic model demonstrated superior results when combining human and LLM judgments, especially when confidence was considered, indicating that accurate calibration of confidence is critical. This configuration consistently outperformed LLM-only teams, emphasizing the value of human insights in complex, knowledge-intensive tasks despite machines owning a standalone performance edge.

Discussion

The study suggests that the proposed logistic regression integration method supports effective human-machine teamwork by leveraging both confidence calibration and error diversity. While the approach offers significant improvements in decision accuracy, it also highlights the varying impact of confidence weighting across different task domains.

This research provides insights into the conditions that enable human-machine complementarity, suggesting that well-calibrated confidence and diverse error patterns are vital. The findings also imply practical applications in developing collaborative AI systems in environments where LLMs and human agents can coexist and augment each other's capabilities.

Conclusion

The paper validates a confidence-weighted logistic regression framework that facilitates the integration of judgments from humans and machines, yielding superior team performance across different contexts. This study reinforces the idea that humans retain a complementary role in collaborative AI environments, demonstrating that strategic teaming can lead to enhanced decision-making outcomes. Future research could extend this approach to other collaborative settings, exploring additional domains where human insights remain indispensable alongside advancing machine capabilities.