Papers
Topics
Authors
Recent
Search
2000 character limit reached

AudioToolAgent: An Agentic Framework for Audio-Language Models

Published 3 Oct 2025 in cs.SD | (2510.02995v1)

Abstract: Large Audio-LLMs (LALMs) perform well on audio understanding tasks but lack multi-step reasoning and tool-calling found in recent LLMs. This paper presents AudioToolAgent, a framework that coordinates audio-LLMs as tools via a central LLM agent that accesses tool adapters for audio question answering and speech-to-text. The agent selects tools, asks follow-up questions, and compares outputs for verification. Experiments with MMAU, MMAR, and MMAU-Pro show state-of-the-art accuracy: up to 74.10% on MMAU, 68.80% on MMAR, and 57.96% on MMAU-Pro. Monte Carlo sampling for shapley values across 374 configurations identifies effective agent-tool combinations. The modular design allows integration of new tools and eliminates the use of data and training costs. Code and reproduction materials are available at: github.com/GLJS/AudioToolAgent

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.