Large Language Models for Automated Open-domain Scientific Hypotheses Discovery

Published 6 Sep 2023 in cs.CL and cs.AI | (2309.02726v3)

Abstract: Hypothetical induction is recognized as the main reasoning type when scientists make observations about the world and try to propose hypotheses to explain those observations. Past research on hypothetical induction is under a constrained setting: (1) the observation annotations in the dataset are carefully manually handpicked sentences (resulting in a close-domain setting); and (2) the ground truth hypotheses are mostly commonsense knowledge, making the task less challenging. In this work, we tackle these problems by proposing the first dataset for social science academic hypotheses discovery, with the final goal to create systems that automatically generate valid, novel, and helpful scientific hypotheses, given only a pile of raw web corpus. Unlike previous settings, the new dataset requires (1) using open-domain data (raw web corpus) as observations; and (2) proposing hypotheses even new to humanity. A multi-module framework is developed for the task, including three different feedback mechanisms to boost performance, which exhibits superior performance in terms of both GPT-4 based and expert-based evaluation. To the best of our knowledge, this is the first work showing that LLMs are able to generate novel (''not existing in literature'') and valid (''reflecting reality'') scientific hypotheses.

Abstract PDF Upgrade to Chat

Citations (24)

View on Semantic Scholar

Summary

The paper introduces TOMATO, a task that leverages LLMs to autonomously generate innovative and valid scientific hypotheses from raw web data.
The MOOSE framework integrates multi-module feedback—past, present, and future—to iteratively refine the hypothesis generation process.
Evaluations by GPT-4 and social science experts show that the approach significantly enhances hypothesis novelty and creativity compared to traditional methods.

LLMs for Automated Open-domain Scientific Hypotheses Discovery (2309.02726)

Introduction

The paper presents a novel approach to developing an NLP system for automated generation of scientific hypotheses using LLMs. The central task, named TOMATO, involves creating systems that autonomously propose innovative and useful hypotheses in the social science domain by leveraging a raw web corpus as the sole data input. This methodology signifies a departure from prior constrained settings that relied on carefully curated datasets, which limited the scope of generated hypotheses to largely commonsensical concepts already familiar to scientific discourse.

New Task Definition: TOMATO

The authors introduce "auTOmated open-doMAin hypoThetical inductiOn" (TOMATO), an entirely automated task designed to generate hypotheses that are both novel and valid without human intervention. The dataset comprises 50 recent social science publications, with accompanying raw web corpus resources necessary for hypothesis development extracted from platforms like news sites and Wikipedia. Crucially, proposed hypotheses should not merely reiterate existing knowledge but instead offer new insights or explanations not previously documented—a quality the paper refers to as being "novel to humanity."

Figure 1: Overview of the new task setting of hypothetical induction and the role of the MOOSE framework.

MOOSE Framework

To tackle this task, the authors developed MOOSE (Multi-mOdule framewOrk with paSt, present, future feEdback), a multi-module architecture utilizing LLM prompting to facilitate the induction process. MOOSE comprises several interconnected modules executing in sequence: Background Finder, Inspiration Title Finder, Inspiration Finder, Hypothesis Suggestor, and Hypothesis Proposer. Each module processes raw web data to iteratively refine and propose hypotheses.

Figure 2: MOOSE: Our multi-module framework for TOMATO task. The black part is the base framework; \textcolor{ORANGE}{orange part} represents past-feedback; \textcolor{GREEN}{green part} represents present-feedback; \textcolor{BLUE}{blue part} represents future-feedback

MOOSE incorporates multiple feedback mechanisms:

Present-feedback: Real-time feedback is provided by LLMs to enable immediate refinement of newly generated hypotheses.
Past-feedback: Evaluates earlier module outputs based on the current state of hypothesis development, allowing backward correction of potential flaws.
Future-feedback: Proposes implications and guiding frameworks that assist subsequent module outputs, enhancing hypothesis generation fidelity.

Evaluation

The authors evaluated MOOSE using a dual approach—GPT-4 evaluation and assessments conducted by social science experts. The results indicate a marked improvement in generating hypotheses with increased novelty and helpfulness in comparison to baseline LLM configurations. Notably, the proposed framework was able to yield hypotheses deemed valid yet previously unasserted within scientific literature.

Analysis

A detailed analysis revealed intriguing capabilities of the MOOSE framework. For instance, introducing heuristic-based past-feedback substantially heightened the novelty of the developed hypotheses. Intriguingly, even hypotheses generated from randomized inspirations demonstrated a baseline level of creativity, suggesting inherent novelty in the data-to-hypothesis mapping recognized by the framework.

Conclusion

The research unearths the potential of LLMs not only to serve as computational aides in hypothesis generation but to autonomously contribute novel scientific propositions. This indicates a significant step towards AI systems capable of bolstering human researchers in the ideation and verification phases of the scientific method, particularly within the expansive context of social sciences. Nevertheless, enhancing cross-domain applications remains a compelling direction for future exploration.

Markdown Report Issue