MemFactory: Unified Memory Training for RL Agents

This presentation explores MemFactory, a groundbreaking unified framework that standardizes reinforcement learning-based memory augmentation for large language model agents. By decomposing the memory lifecycle into modular atomic operations and integrating efficient Group Relative Policy Optimization, MemFactory enables researchers to rapidly prototype, train, and benchmark memory-augmented agents on commodity hardware while achieving consistent performance gains across diverse evaluation scenarios.
Script
Training language models to remember and adapt across long conversations has been held back by a fundamental problem: every research team rebuilds the same memory infrastructure from scratch, incompatibly. MemFactory changes that by providing the first unified framework for training memory-augmented agents with reinforcement learning.
The core challenge is fragmentation. Memory-augmented agents require stateful operations that existing frameworks simply don't provide, forcing researchers to rebuild infrastructure for every experiment. This isn't just inconvenient, it's actively preventing the field from advancing at the pace it should.
MemFactory solves this through radical modularity.
The authors decompose the agent lifecycle into 4 composable layers. The Module and Agent layers handle memory operations and policy execution, while Environment and Trainer layers manage reward computation and optimization. The breakthrough is GRPO, a critic-free reinforcement learning algorithm that normalizes rewards across sampled outputs, dramatically reducing memory requirements while maintaining training stability.
The empirical results are compelling. Agents trained with MemFactory show consistent double-digit improvements on in-domain tasks and maintain those gains under out-of-distribution evaluation, proving the policies genuinely generalize. What makes this practical is the efficiency: the entire workflow fits on a single research-grade GPU, democratizing access to memory reinforcement learning.
The real impact is what this infrastructure enables. Researchers can now prototype novel memory policies, benchmark against state-of-the-art methods, and explore sophisticated reward schedules without rebuilding foundational code. The modular design means innovations in one component, like retrieval reranking or consolidation strategies, can be immediately tested across different agent architectures.
MemFactory transforms memory-augmented agents from bespoke experiments into systematic science. By standardizing the infrastructure, it lets researchers focus on the hard problems: what should agents remember, when should they forget, and how should memory shape reasoning. Visit EmergentMind.com to explore the paper and create your own research videos.