GRUT: Time-Aware Generative Recommender

Updated 19 September 2025

The paper introduces GRUT, a generative recommendation paradigm that integrates user histories with explicit temporal contexts to model evolving user interests.
GRUT utilizes time-aware prompting by embedding user-level and item-level temporal signals, enabling precise modeling of sequential trends and recency effects.
Trend-aware Inference in GRUT fuses generative probabilities with real-time popularity scores, achieving improvements of up to 15.4% Recall@5 and 14.3% NDCG@5.

Generative Recommender Using Time awareness (GRUT) is a paradigm in recommender systems that utilizes large generative models—typically transformer architectures or LLMs—to produce item recommendations in a sequential, time-sensitive manner. Distinct from conventional ranking models, GRUT introduces mechanisms to explicitly encode and utilize temporal dynamics, enabling recommendations that reflect evolving user preferences and recent trends. The architecture and inference processes are enhanced via novel time-aware prompting and specialized inference algorithms that leverage various temporal signals.

1. Definition and Motivation

GRUT formulates recommendation as a generative modeling task, wherein recommendations are “generated” as sequences conditioned on user histories and contextual time signals, rather than scored via static ranking functions. The paradigm extends traditional sequential recommendation by integrating explicit temporal contexts—timestamps, intervals, and item-user transitions—so that the evolving nature of user interests and item popularity is directly modeled. This design addresses shortcomings of prior generative systems, which typically only consider sequential order and neglect fine-grained time dynamics (Lee et al., 17 Sep 2025).

2. Time-Aware Prompting

A core innovation in GRUT is the introduction of time-aware prompting, which injects explicit temporal contexts into the generative process at inference and training. Two distinct context types are used:

User-level temporal context: Encodes personalized temporal patterns, capturing timestamps and time intervals unique to each user. This enables modeling periodicity, recency effects, and historical preference drift.
Item-level transition context: Captures transition patterns between items across different users, modeling how transitions in consumption history reflect collaborative and temporal item relationships.

The context construction involves concatenating time signals (exact timestamps, intervals since last interaction, session markers) as input tokens or embeddings alongside behavioral sequences. These contexts allow the LM-based generator to condition recommendations on both static user/item attributes and dynamic temporal signals.

3. Trend-aware Inference

GRUT introduces a training-free, post-hoc ranking enhancement called Trend-aware Inference. During generation, this method incorporates global item trend information—such as recent spikes in item popularity or interaction rates—when ranking candidate items. Instead of relying solely on the model’s generation likelihood, while producing recommendations, the algorithm jointly considers the generative probability $p(i)$ for item $i$ and a trend score $t(i)$ , which reflects recent activity or “momentum” of the item.

The final recommendation ranking is determined by maximizing a composite score:

$\text{Score}(i) = p(i) \cdot t(i)$

where $t(i)$ can be constructed from moving averages, frequency counts, or time-decayed interaction rates, depending on availability. This mechanism enables the ranker to elevate items currently experiencing surges in demand, addressing shifting global trends and volatility in user interests.

4. Model Architecture and Methodological Components

GRUT models are typically based on autoregressive transformers with adaptations for time-awareness:

Sequential generation: Items are produced one at a time, conditioning each step on the full history of previously generated items and the time-aware context. Formally:

$P(i_{t+1}:i_{t+K} \mid i_{1:t}, C_{\text{time}}) = \prod_{k=1}^K P(i_{t+k} | i_{1:t+k-1}, C_{\text{time}})$

where $C_{\text{time}}$ represents temporal embedding vectors constructed as described above.

Decoder mechanisms: Both greedy decoding and stochastic sampling (e.g., temperature or top-k sampling) can be used, with multi-sequence inference strategies (e.g., reciprocal rank aggregation) being effective for forecasting longer horizons (Volodkevich et al., 2024).
Prompt design: Context windows and input tokenization are constructed to include explicit time features, facilitating downstream time-sensitive generation.

5. Empirical Performance and Evaluation Results

GRUT systems have demonstrated substantive improvements on standard benchmarks:

Recall@5 and NDCG@5 gains: Experimental evaluations on four datasets showcase improvements of up to 15.4% Recall@5 and 14.3% NDCG@5 over state-of-the-art baselines (Lee et al., 17 Sep 2025). These metrics reflect both the accuracy and ranking quality of the top recommended items.
Temporal robustness: Empirical studies indicate that GRUT models more effectively capture long-term trends and adapt to evolving user interests compared to classical transformer or RNN-based sequential recommenders.
Source code availability: The official implementation and evaluation details for GRUT are available at https://github.com/skleee/GRUT, facilitating reproducibility and integration with existing systems.

6. Comparative Analysis and Position in Literature

GRUT advances over prior generative and sequential recommender models by:

Explicit time modeling: Where standard autoregressive or attention-based recommenders primarily model sequence order, GRUT’s contextual embeddings and prompting mechanisms encode real temporal properties, enriching the generative conditioning and improving adaptation to evolving behavior.
Trend sensitivity: The post-hoc Trend-aware Inference method can dynamically react to shifting item popularity, mitigating lag in recommendation performance due to concept drift.
Integration with recent advances: Techniques such as journey-aware sparse attention (Ma et al., 19 Jul 2025), spatiotemporal encoding (Lin et al., 22 Aug 2025), and foundation model-based generation (Huang et al., 23 Apr 2025) can be adapted to GRUT for domain-specific extensions (e.g., POI or multi-behavior recommendation), further enhancing time sensitivity and scalability.

7. Challenges, Limitations, and Future Directions

Key challenges for GRUT include:

Context window limitations: Foundation models have fixed prompt lengths, necessitating efficient summarization or retrieval-augmented generation from long histories (Huang et al., 23 Apr 2025).
Temporal alignment and fairness: Ensuring that time tokens and trend signals are properly aligned in the latent space, and that recommendations do not unduly privilege recency or trending items at the expense of diversity or long-tail content.
Integration with multimodal and agentic paradigms: Extending GRUT to blend time with multimodal data (e.g., images, text, audio) and interactive agent frameworks remains an open research direction, potentially leveraging memory augmentation and causal inference for deeper temporal reasoning.

Continued research in time-aware generative recommendation, especially with advances in large language and transformer models, is expected to further enhance personalization by directly modeling and anticipating the temporal evolution of user needs and interests.