Temporal-aware Matryoshka Representation Learning
- The paper introduces TMRL, which explicitly dedicates a temporal subspace in embeddings using targeted contrastive and self-distillation techniques.
- It leverages a nested Matryoshka design that enables dynamic truncation and tunable efficiency–accuracy trade-offs in retrieval and RAG applications.
- Empirical results show TMRL maintains competitive semantic retrieval while reducing storage and latency, outperforming traditional MRL approaches.
Temporal-aware Matryoshka Representation Learning (TMRL) is a framework for equipping text embedding models (TEMs) with a dedicated temporal subspace, enabling efficient, flexible retrieval of temporally relevant context—particularly in Retrieval-Augmented Generation (RAG) systems. TMRL leverages the nested structure of Matryoshka Representation Learning, explicitly reserves dimensions for temporal encoding, and integrates targeted contrastive learning and self-distillation. The approach yields competitive performance for temporal information retrieval and temporal RAG compared with prior methods, while offering controllable efficiency–accuracy trade-offs (Huynh et al., 9 Jan 2026).
1. Background: Matryoshka Embeddings and Temporal Motivation
Conventional text embedding models encode a query or passage as a single -dimensional vector. Matryoshka Representation Learning (MRL) augments this paradigm by training the encoder such that any prefix of dimensions—where —forms a performant embedding:
- Full embedding:
- Truncated embedding at level :
Vanilla MRL relies on semantic InfoNCE losses summed across truncation levels,
but does not guarantee any explicit temporal signal in the embedding subspaces. Temporal retrieval demands embeddings that encode both "when" and "what." TMRL addresses this by explicitly designating the first dimensions as a temporal-aware subspace.
2. TMRL Model Architecture
TMRL adapts a frozen base TEM using lightweight Low-Rank Adaptation (LoRA) and introduces a temporal projection module.
2.1 Matryoshka Embedding Split
The representation is split as
For a sequence of hidden states , TMRL identifies temporal token positions —as tagged by tools like SUTime. The corresponding vectors are passed through a 2-layer temporal projector , and then mean-pooled:
2.2 Temporal Subspace Contrastive Learning
Temporal retrieval is supervised using positive and negative queries generated through data augmentation and LLM prompting (Qwen3-4B). Contrastive InfoNCE losses in the -dimensional subspace are defined for both query-to-passage and passage-to-query alignments:
- Query-to-passage:
- Passage-to-query:
The full temporal contrastive loss is
where denotes cosine similarity over the temporal subspace, and is a temperature parameter (0.02–0.05).
2.3 Self-Distillation Regularization
To enforce consistency across dimensional truncations, TMRL includes:
- Local similarity preservation—top- neighbors ( alignment):
- Global geometry alignment—linear CKA:
2.4 Unified Training Objective
The final loss combines semantic, temporal, and regularization objectives:
with , , and typical.
The high-level training pipeline involves freezing the base TEM, applying LoRA adapters and the temporal projector, batchwise computation of all loss terms from temporal token extraction, and backpropagation only through LoRA and the projector. At convergence, LoRA is merged and the projector discarded.
3. Training Protocols and Hyperparameter Choices
Optimization is performed using AdamW (learning rate , batch size 256, four hard negatives, 5 epochs—1 for Nomic), and LoRA is set with rank , , and dropout 0.1 for all linear layers. The temperature is tuned per base model. Key hyperparameters are adapted for different TEMs and datasets, with, e.g., Contriever using , on TNP/TimeQA; GTE uses , . Self-distillation (, ) is fixed at 0.1.
4. Evaluation Suites, Data, and Metrics
Datasets and Preprocessing
TMRL is benchmarked on:
- Temporal Nobel Prize (TNP): Paragraph-level, TemporalQA-style queries with single temporal anchors; passages split, multi-date sentences excluded, and queries augmented (explicit, implicit, temporal-answer variants) using Qwen3-4B.
- TimeQA: Single Wikipedia snapshot, chunked to paragraphs and augmented similarly.
Passage indices comprise millions of precomputed representations, queried with FAISS.
Metrics
- Retrieval: nDCG@10, Recall@100 (TNP, TimeQA)
- Semantic Generality: nDCG@10 on BEIR NQ
- RAG Outcome: F1 for answers using Qwen3-8B (FlashRAG, top-5 context)
Baseline Comparisons
- Sparse: BM25 variants
- Zero-shot: Off-the-shelf TEMs
- Supervised Temporal: Ts-Retriever
- Inference Fusion: TempRetriever (joint semantic/temporal encoding by fusion at inference)
- Matryoshka-Adaptor (M-Adaptor), LoRA-only, LoRA-based MRL (Matryoshka w/o temp. subspace)
5. Quantitative Performance and Ablation Analyses
Table: Highlighted Empirical Findings
| Retrieval Scenario | Best TMRL nDCG@10 (Contriever, TNP) | Latency/Storage Impact |
|---|---|---|
| Full-dim (768) Retrieval | 61.26 (vs 56.91, MRL baseline) | Zero inference overhead |
| m=256 Matryoshka Truncation | F1 within 1 pp of full-dim model | storage cut |
| m=64 Matryoshka Truncation | nDCG@10 ≈ 39 (vs ≈35 for MRL only) | smaller index |
Retrieval and RAG performance is competitive or superior across all truncation levels, particularly for smaller embedding models. TMRL raises nDCG@10 at all (e.g., Contriever m=64, +6pp over MRL). Recall@100 is typically maintained or modestly reduced at maximal dimension, a trade-off deemed acceptable for RAG. Semantic robustness (BEIR NQ) is preserved.
Ablation studies indicate:
- Temporal loss weight –0.25 is optimal; higher values favor temporal recall at the expense of semantics.
- Temporal subspace dimension : Contriever benefits up to 128; BGE requires at least 64.
- Self-distillation regularization yields marginal gains at moderate values (0.1); increasing further is detrimental.
Performance directly correlates with RAG F1; at , TMRL matches or exceeds full fine-tuned baselines with storage savings and halved latency.
6. Flexibility in Accuracy–Efficiency Trade-offs
The nested Matryoshka design allows retrieval at any without retraining:
- At , Contriever-TMRL achieves nDCG@10 on TNP (vs 35 for semantic MRL) using a smaller index.
- At , RAG F1 is within 1 percentage point of the full result.
This enables real-time or large-scale retrieval scenarios where footprint and latency are critical, while quality can be maintained by adjusting . A plausible implication is that TMRL uniquely combines plug-and-play fine-tuning with explicit temporal encoding, mixed-dimensionality, and strong semantic retention, all within a single, flexible model instance.
7. Distinguishing Features and Contributions
TMRL is the first model to:
- Efficiently fine-tune existing TEMs (using LoRA) to support Matryoshka truncation.
- Explicitly dedicate a -dimensional subspace to temporal signals, learned via targeted contrastive objectives.
- Leverage systematically augmented positive/negative temporal training pairs.
- Retain semantic retrieval capacity, as evidenced by stable BEIR NQ results.
- Provide a continuous accuracy–efficiency frontier for temporal retrieval and RAG, with no need to retrain per configuration.
These properties make TMRL a novel, unified approach for efficient, flexible, and temporally-aware information retrieval tasks (Huynh et al., 9 Jan 2026).