Time Aggregation Features for XGBoost

Updated 22 January 2026

The paper demonstrates that trailing-window aggregation improves ROC AUC and PR AUC, highlighting its effectiveness over time-aware encoding alone.
It details the use of log-impression counts and smoothed event rates to transform sequential event histories while rigorously preventing data leakage.
The study compares various windowing schemes and recommends multi-scale trailing windows as the default for tasks like CTR estimation and churn prediction.

Time aggregation features for XGBoost are engineered variables that summarize sequential or timestamped event histories into structured, fixed-dimensional representations suitable for tree-based learning. These features are critical in temporal domains where recency, frequency, and distribution of past events influence predictions, including click-through rate (CTR) estimation, churn prediction, and sequential behavior modeling. Methods for constructing time aggregation features adapt sliding-window statistics, event-count cutoffs, and window shape controls to encode recency information, while rigorously enforcing non-leakage constraints in temporally split training regimes.

1. Formal Definitions of Time Aggregation Features

Let $h \in \mathbb{N}$ represent a discrete time index (e.g., hour of day), $e$ an entity key (such as device_id or app_id), and $v$ a specific entity value. For a specified window—characterized by a lookback length or condition—define the impression and event counts as: $I_{v,[a,b)} = \sum_{t=a}^{b-1} \mathbf{1}\big(\text{impression at } t \text{ with } e = v\big)$

$C_{v,[a,b)} = \sum_{t=a}^{b-1} \mathbf{1}\big(\text{click at } t \text{ with } e = v\big)$

The canonical time aggregation features include:

Log-impression count: $F^{imp}_{v,w}(h) = \log(1 + I_{v,[h-w, h)})$
Smoothed event (e.g., click) rate: $F^{rate}_{v,w}(h) = \frac{C_{v,[h-w, h)} + \alpha}{I_{v,[h-w, h)} + \alpha + \beta}$

where $\alpha$ , $\beta$ are smoothing parameters (default: $\alpha=1, \ \beta=10$ ). These features strictly use only data from times $t < h$ ("no-lookahead") (Pinchuk, 15 Jan 2026).

2. Windowing Schemes for Aggregation

Several temporal aggregation window schemes are systematically compared:

Trailing Windows: Half-open intervals $[h-w, h)$ , parameterized by $w$ (hours, days, etc.). Each window summarizes the recency and density of activity immediately prior to $h$ .
Event-count Windows (“event $N$ ”): Given $N$ , construct the aggregation window as the most recent $N$ events for $e = v$ before $h$ . Compute $I^{(N)}_v(h)=\min(N, \text{prior impressions})$ and corresponding summed events.
Gap Windows (“gap1”): Exclude a gap interval immediately preceding $h$ ; e.g., for window $w$ and exclusion $g$ , window is $[h - (w+g), h-g)$ .
Bucketized Windows: Split history into $K$ disjoint intervals (e.g., $[h-b_j, h-b_{j-1})$ ) using predefined edges.
Calendar-aligned Windows: Use full prior-day or week boundaries, especially for coarse windows (e.g., $w=24,48,168$ hours) (Pinchuk, 15 Jan 2026, Gregory, 2018).

Typical window length tuples for multi-scale recency aggregation are $(1,6,24,48,168)$ hours for CTR, or $[7, 14, 30, 60, 90]$ days in churn contexts (Pinchuk, 15 Jan 2026, Gregory, 2018).

3. Construction and Implementation of Time Aggregation Features

For each instance at prediction time $h$ and each selected entity key:

For all specified windows, compute log-impression and smoothed-rate features.
Concatenate resulting features from all keys and windows.
Optionally include trend terms (short-window vs. long-window statistics), exponential-decay features (weighted sum/mean with $w(t_{i,j}) = \exp(-\lambda[h - t_{i,j}])$ ), and calendar/fixed window statistics to disentangle seasonality from short-term effects.

Data leakage is prevented by enforcing strict $t < h$ data usage and temporal splits between training/validation/test (Pinchuk, 15 Jan 2026, Gregory, 2018).

4. Empirical Performance and Comparative Analysis

Experimental results on the Avazu CTR dataset under robust out-of-time protocols demonstrate:

Addition of trailing-window aggregation (on top of time-aware target encoding) yields consistent ROC AUC (+0.0066–0.0082) and PR AUC (+0.0084–0.0094) gains over encoding alone.
Event-count windows provide a further, but small, improvement (ROC AUC ~+0.0004).
Gap and bucketized windows underperform trailing windows in this protocol.
Calendar windows show mixed and negligible effects compared to trailing windows.

These findings support the following table:

Feature Specification	ROC AUC (mean)	PR AUC (mean)
Time-aware encoding (TE)	0.74278	0.35863
TE + trailing (1,6,24,48,168)	0.75014	0.36752
+ event50	0.75054	0.36794

Source: (Pinchuk, 15 Jan 2026); means over two rolling folds.

A plausible implication is that multi-scale trailing windows should be the default, with event-based windows added only when marginal ROC improvements are critical. Window shape tuning (gap, bucket, calendar) does not yield superior results in these settings.

5. Cross-Domain Applications and Variants

Time aggregation design principles translate to other temporal domains:

Customer churn: Sliding, calendar, and exponential-decay windows over user activity or transaction metrics; trend features for acceleration/deceleration detection (Gregory, 2018).
Acoustic event detection: Non-overlapping sliding windows, Fourier/statistical descriptors aggregated per window, and adaptive feature engineering (aggregation and crosses) (Sha et al., 2022).
Transformer fault diagnosis: Entire signal (or decomposed intrinsic time-scale components) per device treated as features, not summary statistics, after ranking by discriminative power (Sami et al., 2021).
Video quality: Mean pooling over frames; no temporal windows, but fusion of temporal means and framewise residuals as aggregate features (Premkumar et al., 11 Jun 2025).

The windowing mechanism, summary statistic type, and normalization must be adapted to domain dynamics, sampling rates, and leakage constraints.

6. Practical Guidelines and Limitations

Default recommendation: Trailing windows on relevant entity keys with window lengths $(1,6,24,48,168)$ hours for high-frequency events. Use log-transformed counts and smoothed rates.
Adaptive augmentation: Add event-count windows (e.g., "event50") if ROC AUC gains on the order of $0.0004$ are practically valuable.
Do not use gap or bucket windows when strictly enforcing no-lookahead and temporal generalization; they exhibit systematically degraded performance.
Cross-validation: Perform paired-delta AUC tests to confirm true performance gains. Feature set expansions should be justified by mean improvement and statistical significance.
Interpretability: Tree-based models handle collinear and overcomplete feature sets well but benefit from redundancy pruning for speed and interpretability.

Code and experiment templates are available in the referenced repositories (Pinchuk, 15 Jan 2026).

7. References

"Time Aggregation Features for XGBoost Models" (Pinchuk, 15 Jan 2026)
"Predicting Customer Churn: Extreme Gradient Boosting with Temporal Data" (Gregory, 2018)
"An acoustic signal cavitation detection framework based on XGBoost with adaptive selection feature engineering" (Sha et al., 2022)
"Power Transformer Fault Diagnosis with Intrinsic Time-scale Decomposition and XGBoost Classifier" (Sami et al., 2021)
"Learning Quality from Complexity and Structure: A Feature-Fused XGBoost Model for Video Quality Assessment" (Premkumar et al., 11 Jun 2025)