Stateless Decision Memory for Enterprise AI Agents

Published 22 Apr 2026 in cs.AI | (2604.20158v1)

Abstract: Enterprise deployment of long-horizon decision agents in regulated domains (underwriting, claims adjudication, tax examination) is dominated by retrieval-augmented pipelines despite a decade of increasingly sophisticated stateful memory architectures. We argue this reflects a hidden requirement: regulated deployment is load-bearing on four systems properties (deterministic replay, auditable rationale, multi-tenant isolation, statelessness for horizontal scale), and stateful architectures violate them by construction. We propose Deterministic Projection Memory (DPM): an append-only event log plus one task-conditioned projection at decision time. On ten regulated decisioning cases at three memory budgets, DPM matches summarization-based memory at generous budgets and substantially outperforms it when the budget binds: at a 20x compression ratio, DPM improves factual precision by +0.52 (Cohen's h=1.17, p=0.0014) and reasoning coherence by +0.53 (h=1.13, p=0.0034), paired permutation, n=10. DPM is additionally 7-15x faster at binding budgets, making one LLM call at decision time instead of N. A determinism study of 10 replays per case at temperature zero shows both architectures inherit residual API-level nondeterminism, but the asymmetry is structural: DPM exposes one nondeterministic call; summarization exposes N compounding calls. The audit surface follows the same one-versus-N pattern: DPM logs two LLM calls per decision while summarization logs 83-97 on LongHorizon-Bench. We conclude with TAMS, a practitioner heuristic for architecture selection, and a failure analysis of stateful memory under enterprise operating conditions. The contribution is the argument that statelessness is the load-bearing property explaining enterprise's preference for weaker but replayable retrieval pipelines, and that DPM demonstrates this property is attainable without the decisioning penalty retrieval pays.

Abstract PDF Upgrade to Chat

Authors (1)

Vasundra Srinivasan

Summary

The paper introduces Deterministic Projection Memory (DPM) as a stateless architecture that guarantees deterministic replay, auditability, and multi-tenant isolation in regulated enterprise settings.
Evaluation results show that DPM significantly improves factual precision and reasoning coherence over stateful approaches, especially under tight budget constraints.
The study demonstrates that adopting a stateless memory design reduces operational complexity and cost, aligning agent memory with stringent compliance requirements.

Stateless Decision Memory for Enterprise AI Agents: An Authoritative Review

Architectural Motivation and Problem Formulation

The paper presents Deterministic Projection Memory (DPM), a deliberately minimalist agent memory architecture engineered for the requirements of enterprise AI deployment in regulated decision-making domains, including underwriting, insurance claims adjudication, and clinical review. The work identifies a persistent mismatch between the academic literature on agent memory—which has advanced a series of increasingly sophisticated stateful, path-dependent architectures—and the prevailing preference in enterprise practice for retrieval-augmented generation (RAG) pipelines. A central claim is that this preference is not explained by accuracy results, but rather by a set of underappreciated system-level properties demanded by actual regulated environments: deterministic replay, auditable rationale, multi-tenant isolation, and stateless operation at serving scale.

The exposition makes explicit that stateful memory architectures, by their nature, fundamentally violate these constraints and that retrofitting statelessness and replayability is costly and fragile. As demonstrated in (Figure 1), DPM satisfies all four properties by architectural construction, while stateful architectures incur increasing engineering debt as their sophistication grows.

Figure 1: Enterprise properties supported by each memory architecture family. DPM satisfies all four by construction. Stateful architectures require retrofits that compound with architectural sophistication.

Deterministic Projection Memory: Design and Theoretical Properties

DPM encapsulates agent memory as an append-only, immutable event log. It defers all consolidation until decision time, at which point a single, task-conditioned projection is generated by an LLM via one temperature-zero call. The projection is structured into explicit sections: raw facts, reasoning chains, and compliance notes, with strict reference to event indices and verbatim preservation of numeric and identifier anchors. This functional, log-plus-pure-projection design is rooted in distributed systems' event sourcing and is already pervasive in high-assurance auditing domains.

By construction, DPM's statelessness ensures that the entirety of the memory required for replay, audit, or tenant isolation is encoded: (1) in the event log, and (2) as a deterministic, pure function of the log, the decision specification, and the model version. This yields a minimal audit and operations surface and enables deterministic replay modulo residual API nondeterminism, as formalized in their replay property proposition.

The architecture explicitly does not solve the general case for trajectories requiring context Windows exceeding those of current LLMs, nor does it aim to supplant corpus-level retrieval mechanisms. The design’s deliberate weakness in expressive power—eschewing deliberative edits or in-trajectory memory mutation—aligns directly with the regulated enterprise’s priorities and is not positioned as a limitation relative to retrieval or hierarchical memory systems.

Empirical Evaluation: Decision Quality, Cost, and Determinism

The empirical protocol evaluates DPM against incremental summarization-based stateful memory ("Summ-only") over two regulated decisioning domains, using LongHorizon-Bench, three budget settings, and four scoring axes: factual precision (FRP), reasoning coherence (RCS), decision accuracy (EDA), and compliance reconstruction (CRR).

The results are decisive: at moderate and loose budgets ( $\rho = 2, 5$ ), DPM and Summ-only are statistically indistinguishable on all axes. At a tight budget ( $\rho=20$ ), DPM exhibits substantial gains: FRP and RCS improve by $+0.52$ and $+0.53$ (Cohen's $h$ above 1.1, $p<0.005$ for both). The interpretation is rooted in information-theoretic compounding: incremental summarization is lossy at each step, and losses accumulate with event count. DPM, as a non-compounding pure projection, avoids this degradation entirely.

The budget/architecture interaction is made explicit in (Figure 2), demonstrating that DPM's major empirical advantage is contingent on compression pressure; at loose budgets, both approaches saturate performance.

Figure 2: Decision-alignment axes by budget; significant improvement for DPM at tight budgets.

The cost analysis reveals operational implications: at tight and moderate regimes, DPM's per-decision wall-clock speed and compute cost are $7$– $15\times$ superior, since it elides per-event consolidation calls in favor of a single projection.

An auxiliary determinism study quantifies the byte-level drift under temperature-zero replay with current commercial APIs. Both approaches inherit residual stochasticity from the underlying LLM, but DPM reduces the architectural exposure from $N$ calls (in a typical trajectory, $N \sim 80$ –100) to a single call. Empirically, this is observed as unique full-surface hashes across replays—even as the prefix remains stable—and the architectural audit surface is shown to scale linearly for Summ-only but remains constant for DPM (Figure 3).

Figure 3: Byte-hash uniqueness and edit distance across replays; prefix stability is high but full-surface hashes are unique due to trailing nondeterminism.

Compression Scaling and Domain-Specific Heuristic

The compression-ratio scaling experiment confirms that DPM's decision-alignment advantage is strictly proportional to the regime where memory budget binds. At ratios below 10, differences are negligible; above that, DPM dominates on factual and reasoning metrics with effect sizes greater than one (Figure 4).

Figure 4: DPM minus Summ-only on decision-alignment axes as a function of compression ratio; DPM advantage grows at high compression.

The practitioner heuristic TAMS ("Task-Adaptive Memory Selection") is offered as a rule: DPM is strictly preferred where deterministic replay, auditability, or tight budgets are required. Otherwise, either approach suffices, and operational concerns dictate selection.

Theoretical and Practical Implications

The work’s implications are both architectural and operational. Theoretically, it establishes statelessness as a load-bearing constraint for agent memory in regulated environments—surface area, auditability, and replay are not merely implementation details but should drive architectural selection. DPM exemplifies a design that delivers exact alignment with these system constraints.

Practically, the findings indicate that organizations can eliminate substantial cost and complexity by adopting stateless, projection-based memory without sacrificing decision quality, provided their budget regime allows. These constraints, and the associated failure modes of sophisticated stateful systems (drift, replay complexity, leakage, audit burden), are cataloged in detail and instantiated with real operating costs.

Limitations and Future Directions

The DPM design is bounded by current LLM context window limitations, a focus on two regulatory domains, and evaluation with one model family. Scalability to multi-trajectory workflows and adversarial scenarios remains an open issue. The deterministic replay guarantee is strictly attainable only with a deterministic inference backend—DPM makes this practical but not automatic.

Future work should extend DPM with hierarchical projections for longer horizons, pair DPM with corpus-level retrieval or verification, conduct adversarial/robustness evaluations, and quantify long-term engineering cost savings in production-scale deployments.

Conclusion

The paper’s central contribution is to realign the agent-memory architecture debate around the dominant operational requirements of enterprise deployment: deterministic replay, auditability, multi-tenant isolation, and stateless scalability. DPM is shown to be the minimal architecture that satisfies these requirements, matching best-in-class stateful systems on decision quality except at high compression, where it excels. The implications are significant for practitioners and researchers building AI agents for regulated domains: prioritized statelessness is not only feasible, but strictly preferable under enterprise-compliance constraints, and can be achieved without incurring retrieval’s historical performance penalties.

Markdown Report Issue