Operationalizing Budget Control and Identifying Design Axes for Runtime Agent Memory

Determine how to operationalize budget control for runtime agent memory extraction in large language model agents, identify the design axes that best capture meaningful performance–cost trade-offs, and characterize how these choices behave across different budget regimes.

Background

The paper studies on-demand, runtime memory extraction for LLM agents, where computation must be managed at inference time rather than through offline, fixed pipelines. While prior systems often employ monolithic or query-agnostic approaches, the authors highlight that making cost and latency first-class concerns at runtime raises fundamental questions about how to control computation effectively.

Specifically, even after specifying a budgeting scheme for memory processing, the field lacks clarity on practical mechanisms for operationalizing such control, on which design dimensions (e.g., procedure, reasoning behavior, model capacity) provide the most meaningful performance–cost trade-offs, and on how these choices interact with different budget regimes. The paper proposes BudgetMem as a step toward addressing these issues, but explicitly flags this uncertainty as a key challenge motivating the work.

References

As a result, even after a budgeting scheme is specified, it remains unclear how to operationalize budget control, which design axes best capture meaningful trade-offs, and how these choices behave across different budget regimes.

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory  (2602.06025 - Zhang et al., 5 Feb 2026) in Section 1 (Introduction)