Quantifying personalized performance of memory‑equipped LLM agents in noisy real‑world settings
Develop rigorous, standardized methods to quantify the personalized performance of long‑term memory–equipped large language model agents under complex, noisy real‑world interaction scenarios, where user preferences evolve over time and interactions contain in‑session noise and linguistic variability.
References
While these architectural innovations endow agents with the potential for long-range memory, the method for quantifying their personalized performance within complex, noisy real-world scenarios remains an open challenge.
— PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments
(2603.23231 - Liu et al., 24 Mar 2026) in Introduction (Section 1)