Papers
Topics
Authors
Recent
Search
2000 character limit reached

OpusLM: A Family of Open Unified Speech Language Models

Published 21 Jun 2025 in cs.CL, cs.SD, and eess.AS | (2506.17611v1)

Abstract: This paper presents Open Unified Speech LLMs (OpusLMs), a family of open foundational speech LLMs (SpeechLMs) up to 7B. Initialized from decoder-only text LLMs, the OpusLMs are continuously pre-trained on 213K hours of speech-text pairs and 292B text-only tokens. We demonstrate our OpusLMs achieve comparable (or even superior) performance with existing SpeechLMs in speech recognition, speech synthesis, and text-only capabilities. Technically, this paper articulates our SpeechLM designs on tokenization, multi-stream LLMs, and multi-stage training strategies. We experimentally demonstrate the importance of model size scaling and the effect of annealing data selection. The OpusLMs are all built from publicly available materials and are fully transparent models. We release our code, data, checkpoints, and training logs to facilitate open SpeechLM research

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 37 likes about this paper.