Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dynamic Memory Trees (MemTree)

Updated 6 February 2026
  • Dynamic Memory Trees are hierarchical, online memory architectures that efficiently insert, update, and retrieve diverse memories using adaptive clustering and tree-based abstraction.
  • They utilize dynamic insertion protocols with cosine similarity and depth-adaptive thresholds to decide when to branch or merge new information.
  • Their collapsed retrieval strategy enhances long-context reasoning in large language models by combining high-level summaries with detailed memory access.

Dynamic Memory Trees (MemTree) are a class of hierarchical, online memory architectures designed to support efficient insertion and retrieval of memories—text segments, embeddings, or supervised experience samples—in scenarios where traditional flat or sequential memory representations limit scaling or reasoning over long-term dependencies. These data structures underpin improved long-context reasoning for LLMs, supervised learning, and contextual multi-label retrieval by leveraging tree-based clustering, adaptive abstraction, and sublinear access to relevant historical information. The approach is directly inspired by human cognitive schemas, organizing distinct pieces of knowledge at varying abstraction levels and maintaining interrelated clusters as new data arrive. Variants and related work include the Contextual Memory Tree (CMT), Eigen Memory Tree (EMT), and recent LLM-centric hierarchical memory controllers; each introduces distinct online learning principles, partitioning strategies, and update protocols that trade off speed, flexibility, and consistency (Rezazadeh et al., 2024, Rucker et al., 2022, Sun et al., 2018).

1. Memory Structure and Representation

MemTree represents memory as a rooted, directed tree T=(V,E)T = (V, E) where VV is the set of nodes and EE comprises parent–child edges. Each non-root node vVv \in V carries a tuple [cv,ev,pv,Cv,dv][c_v, e_v, p_v, \mathcal{C}_v, d_v]:

  • cvc_v: textual content, either an atomic memory or aggregated summary.
  • evRde_v \in \mathbb{R}^d: dd-dimensional 2\ell_2-normalized embedding of cvc_v, computed via a pretrained model femb(cv)f_{\text{emb}}(c_v).
  • pvVp_v \in V: parent pointer.
  • CvV\mathcal{C}_v \subset V: child pointers.
  • dvd_v: depth from root.

The root v0v_0 is a special node: cv0=c_{v_0} = \emptyset, ev0=e_{v_0} = \emptyset, and dv0=0d_{v_0}=0.

This hierarchical structure supports variable levels of abstraction: shallow nodes capture high-level summaries, while deep nodes retain fine-grained detail. Aggregation at internal nodes is performed using LLMs, prompting them to blend and abstract lower-level content as children grow in number (Rezazadeh et al., 2024).

The Contextual Memory Tree (CMT) and Eigen Memory Tree (EMT) adopt related structures. EMT uses a full binary tree with internal router vectors aligned with the principal component of stored memories to enable efficient routing (Rucker et al., 2022), while CMT employs incrementally learned classifiers at each internal node for query-dependent routing (Sun et al., 2018).

2. Dynamic Insertion and Update Protocols

For each new observation cnewc_{\text{new}}, MemTree:

  1. Creates a temporary node vnewv_{\text{new}} with content cnewc_{\text{new}} and computes enew=femb(cnew)e_{\text{new}} = f_{\text{emb}}(c_{\text{new}}).
  2. Traverses the tree top-down from the root. At each non-leaf node vv, computes cosine similarities sim(enew,ei)=enewei\text{sim}(e_{\text{new}}, e_i) = e_{\text{new}} \cdot e_i against all child embeddings.
  3. Selects the best scoring child vbestv_{\text{best}} with maximum similarity smaxs_{\max}. If smaxθ(d)s_{\max} \geq \theta(d), a depth-adaptive threshold, the process recurses under vbestv_{\text{best}}; otherwise, a new leaf is created under vv.
  4. Internal node content is aggregated using LLM prompting (Appendix A.2 in (Rezazadeh et al., 2024)).

The depth-adaptive threshold θ(d)=θ0exp(λd)\theta(d) = \theta_0 \cdot \exp(\lambda d) guides merging versus branching; θ0=0.4\theta_0 = 0.4, λ=0.5\lambda = 0.5 by default.

In CMT, insertion uses routers trained as binary classifiers to select the left or right branch per input, balancing reward-driven routing with explicit subtree size balancing via a parameter α\alpha (Sun et al., 2018).

In EMT, routing is determined by PCA-based split axes at internal nodes; splitting occurs when leaves reach a fixed capacity, with routers and split boundaries set via batch principal component analysis to maintain balance (Rucker et al., 2022).

3. Retrieval, Summarization, and Pruning Strategies

MemTree's signature retrieval method is “collapsed tree retrieval,” analogous to RAPTOR (Rezazadeh et al., 2024). The procedure:

  1. Embeds a query qq: eq=femb(q)e_q = f_{\text{emb}}(q).
  2. Computes cosine similarity to every node vv in VV.
  3. Filters nodes with similarity below threshold θretrieve\theta_{\text{retrieve}}, sorts the remainder in descending similarity, and returns the top-kk.
  4. This unifies high-level summaries with detailed memory, as both internal and leaf nodes participate.

Summarization is managed online: when a new child is added, the affected internal node's content is updated by prompting an LLM to merge old and new content, producing more abstract summaries as the number of children increases. There is no explicit loss or regularizer—compression is prompt-driven. No explicit pruning is performed, but threshold adaptivity discourages the merging of dissimilar nodes deep in the tree, which functionally isolates outliers (Rezazadeh et al., 2024). Pruning unaccessed leaves can be added post hoc if desired.

CMT and EMT retrieve by path-following according to routers (CMT) or fixed PCA splits (EMT), with k-best scoring performed in leaf nodes (Sun et al., 2018, Rucker et al., 2022).

4. Theoretical Properties and Algorithmic Complexity

Let N=VN = |V| (number of nodes) and dd the embedding dimension. For MemTree (Rezazadeh et al., 2024):

  • Insertion: O(logNd)O(\log N \cdot d) on average (balanced tree), as similarity is computed against siblings at each level, and aggregation calls are O(logN)O(\log N) per insert.
  • Retrieval (collapsed): O(Nd+Nlogk)O(Nd + N\log k) to score all nodes and maintain a top-kk heap.
  • Summarization: O(logN)O(\log N) LLM calls per insertion, content bounded by model context.
  • Space: O(Nd)O(Nd) for embedding storage plus space for tokenized content over all vv.

CMT's insertion, query, and update operations have O((K+c)logN)O((K+c)\log N) time complexity, where KK characterizes split-balance and cc leaf size multiplier (Sun et al., 2018). EMT achieves O(logN)O(\log N) insertion and query due to balanced PCA splitting; split operations require O(dc)O(dc) but are infrequent (Rucker et al., 2022).

Explicit rerouting in CMT improves “self-consistency,” ensuring that queried memories can be retrieved after insertion. EMT guarantees self-consistency by construction due to fixed PCA-based routers.

5. Empirical Evaluation and Comparative Performance

MemTree has been benchmarked across multi-turn dialogue, document question answering, and multi-hop retrieval tasks (Rezazadeh et al., 2024):

Task/Dataset MemTree Flat Baseline (MemoryStream) Offline/Hybrid Baselines
MSC (15 turns, only memory) 84.8% acc,<br\>79.9 ROUGE 84.4%/<br\>79.1 MemGPT 70.4%/<br\>68.6
MSC-Extended (200 turns) 82.5% 80.7% Naive full-history 78.0%
QuALITY (hard QA) 59.8% 43.8% RAPTOR 59.0%,<br>GraphRAG 62.8%
MultiHop RAG 80.5% 74.7% RAPTOR 81.0%

Collapsed retrieval generally outperforms traversal-based methods for coverage and appropriately matches queries with nodes at relevant abstraction levels. MemTree significantly narrows the performance gap to offline retriever-augmented generation (RAG) baselines, even outperforming in some prompt-compression limits and on evidence-heavy queries.

CMT exhibits empirical improvements in one-shot/online multi-class, multi-label, and image–caption retrieval, dramatically accelerating candidate recall and inference versus linear-time nearest neighbors or One-Against-All baselines (Sun et al., 2018). EMT further outperforms CMT in online bandit settings, with a “no-downside” parametric+EMT hybrid giving consistent performance gains over purely parametric or tree-only approaches, even under memory constraints (Rucker et al., 2022).

6. Comparative Analysis and Limitations

MemTree (CMT), EMT, and related architectures share the goal of sublinear cost online memory with unbounded growth, but differ in router learning policy, split mechanism, and consistency guarantees (Rucker et al., 2022, Sun et al., 2018).

  • Router Training: CMT uses classification loss for adaptive partitioning, but routers can drift as data distribution changes; EMT employs one-shot PCA per split, achieving balance and fixed partitioning but at the cost of future adaptability.
  • Scoring: CMT may use complex regressors for query–memory distance; EMT restricts to global linear scorers over absolute feature differences for self-consistency.
  • Self-Consistency: EMT and CMT with reroutes preserve the property that identical inputs are always retrieved, while pure CMT without reroutes can lose this due to router drift.

Limitations for MemTree and variants include sensitivity to embedding quality (for semantic partitioning), LLM-context limitations in prompting for summary nodes, and—especially in EMT—a static routing structure once splits occur, potentially reducing robustness to non-stationary or high-cardinality data (Rucker et al., 2022). CMT addresses data drift more flexibly, but at the possible expense of retrieval consistency.

7. Research Significance and Applications

Dynamic Memory Trees enable scalable, online memory augmentation for deep models and machine learning algorithms, providing:

  • Multi-scale, hierarchical aggregation that supports retrieval at appropriate abstraction for diverse queries.
  • Dynamic, clustering-based updates that refine knowledge schemas in step with new observations, circumventing the need for repeated global index rebuilding inherent to static RAG pipelines.
  • Demonstrated improvements in dialogue understanding, QA under long histories, extreme classification, and multi-label multilayered retrieval, with orders-of-magnitude improvement in inference speed relative to flat-memory and nearest neighbor methods (Rezazadeh et al., 2024, Sun et al., 2018, Rucker et al., 2022).

These architectures serve as a foundation for schema-like, cognitively inspired memory controllers in LLMs and continual learning agents, facilitating efficient, adaptive handling of extended knowledge bases and conversational memory.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (3)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dynamic Memory Trees (MemTree).