Papers
Topics
Authors
Recent
Search
2000 character limit reached

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Published 10 Jun 2024 in cs.AI, cs.CL, and cs.LG | (2406.06469v1)

Abstract: Language agents perform complex tasks by using tools to execute each step precisely. However, most existing agents are based on proprietary models or designed to target specific tasks, such as mathematics or multi-hop question answering. We introduce Husky, a holistic, open-source language agent that learns to reason over a unified action space to address a diverse set of complex tasks involving numerical, tabular, and knowledge-based reasoning. Husky iterates between two stages: 1) generating the next action to take towards solving a given task and 2) executing the action using expert models and updating the current solution state. We identify a thorough ontology of actions for addressing complex tasks and curate high-quality data to train expert models for executing these actions. Our experiments show that Husky outperforms prior language agents across 14 evaluation datasets. Moreover, we introduce HuskyQA, a new evaluation set which stress tests language agents for mixed-tool reasoning, with a focus on retrieving missing knowledge and performing numerical reasoning. Despite using 7B models, Husky matches or even exceeds frontier LMs such as GPT-4 on these tasks, showcasing the efficacy of our holistic approach in addressing complex reasoning problems. Our code and models are available at https://github.com/agent-husky/Husky-v1.

Citations (6)

Summary

  • The paper introduces Husky, an agent that unifies numerical, tabular, and knowledge-based reasoning using a single action space.
  • The paper demonstrates its effectiveness with an iterative two-stage process and superior results on benchmarks like GSM-8K and Bamboogle.
  • The paper sets a new standard for open-source multi-step reasoning, promoting adoption in both academic research and industry applications.

A Professional Overview of "Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning"

Husky represents a significant advancement in the development of open-source language agents, tackling the complexity of multi-step reasoning tasks often confined to proprietary models. Falling under the auspices of the University of Washington, Meta AI, and the Allen Institute for AI, the paper introduces an agent designed to operate across diverse domains through the integration of a unified action space.

Key Contributions

The paper highlights several pivotal contributions:

  • Unified Action Space: Husky leverages a unified action space that spans numerical, tabular, and knowledge-based reasoning tasks, addressing each with optimal tools, including code generators, mathematical reasoners, and search query generators.
  • Iterative Two-Stage Process: In each iteration, Husky generates the next action through an action generator and executes it via specialized expert models, continually refining the task’s solution state.
  • Ontology of Actions: A thorough ontology of actions is defined for Husky, enabling it to consistently generate and execute high-precision steps toward solving tasks.
  • Evaluation Set and Performance: The paper introduces a new evaluation set, HuskyQA, which challenges agents on their multifaceted reasoning capabilities, particularly under mixed-tool conditions. Remarkably, Husky outperforms established agents across 14 datasets and matches or exceeds frontier models like GPT-4.

Empirical Results

The experimental results underscore Husky's efficacy:

  • Performance Metrics: Husky demonstrates superior performance across multiple domains:
    • On GSM-8K, Husky achieves an accuracy of 77.9%.
    • In knowledge-based tasks, Husky attains 54.4/65.8 EM/F1 on Bamboogle.
  • Comparative Outcomes: Husky outperformed notable language agents such as FireAct and Lumos, exemplified by its 20-point lead over Lumos on GSM-8K.

Implications and Future Directions

Practical Implications:

  • Adoption in Research and Industry: Given its open-source nature and effective multi-domain task performance, Husky is poised for adoption in both academic research and industry applications, providing a versatile alternative to reliance on costly proprietary models.
  • Enhancing Open-Source Solutions: Husky sets a benchmark for the development of robust open-source agents, encouraging further innovations in the public domain.

Theoretical Implications:

  • Model Generalization: The success of Husky underscores the viability of cross-domain training for action generators, pointing towards broader generalizability of AI models across divergent tasks.
  • Tool Integration Capabilities: The modular, tool-integrated approach of Husky highlights the importance of specialized reasoning capabilities within a unified framework.

Future Directions:

  • Enhanced Expert Models: Future research may explore further enhancements in expert models, potentially integrating larger models or domain-specific pretraining to bolster performance.
  • Action Space Expansion: Scaling the action space to encompass a broader set of tasks, including new categories such as multimedia processing or advanced simulation, could further broaden Husky's applicability.
  • Performance Optimization: Continuous refinement of inference procedures and model architectures will likely yield improvements in efficiency and accuracy, enabling real-time applications.

Conclusion

Husky stands as a testament to the power of collaborative, open-source initiatives in advancing the field of artificial intelligence. By addressing a wide variety of complex reasoning tasks through a unified and efficient framework, Husky not only pushes the boundaries of what open-source models can achieve but also sets a robust foundation for future advancements in multi-step reasoning agents. The framework's scalability and demonstrated efficacy ensure Husky's place as a vital tool within the AI community, compelling further exploration and development in this intriguing domain.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 9 tweets with 371 likes about this paper.