Papers
Topics
Authors
Recent
Search
2000 character limit reached

UMAP Mixup for On-Manifold Data Augmentation

Updated 22 January 2026
  • The paper introduces an on-manifold augmentation scheme combining UMAP embeddings with Mixup to confine synthetic samples to plausible data regions.
  • The methodology leverages a UMAP-based embedding to perform Mixup in latent space, thereby enhancing regularization and reducing off-manifold interpolation.
  • Experimental evaluations demonstrate lower RMSE on tabular and time-series regression tasks compared to standard Mixup and manifold Mixup approaches.

UMAP Mixup is a data augmentation scheme designed for deep learning regression models, in which synthetic examples are generated through convex combinations of embedded representations constrained to remain close to the data manifold. The method leverages a UMAP-based embedding to perform Mixup operations "on-manifold," addressing failures of conventional Mixup that may produce implausible samples by interpolating beyond the support of the data distribution. UMAP Mixup seeks to strike a balance between enhanced regularization from Mixup and geometric fidelity via manifold learning, leading to improved generalization, especially in tabular and time-series domains (El-Laham et al., 2023).

1. Origins and Motivation

Standard Mixup, as proposed by Zhang et al. (2017), replaces input-label pairs (xi,yi)(x_i, y_i) and (xj,yj)(x_j, y_j) with convex interpolations: x~=λxi+(1λ)xj,y~=λyi+(1λ)yj,λBeta(α,α)\tilde x = \lambda x_i + (1-\lambda) x_j,\qquad \tilde y = \lambda y_i + (1-\lambda) y_j,\qquad \lambda \sim \mathrm{Beta}(\alpha, \alpha) Mixup encourages models to exhibit local linearity in the vicinity of training samples, thereby reducing overfitting and enhancing test-time robustness. However, when xx and yy do not admit a globally linear structure—common in non-visual or structured data—this process can generate samples lying outside the true data manifold (i.e., "off-manifold" regions), leading to manifold intrusion and implausibility.

UMAP (Uniform Manifold Approximation and Projection) is a nonlinear dimensionality reduction technique that constructs an underlying data graph PP encoding local neighborhood relationships (pi,jp_{i,j}), then seeks an embedding-space graph QϕQ_\phi that preserves these relationships by minimizing: C(P,Qϕ)=ij{pi,jlogpi,jqi,j+(1pi,j)log1pi,j1qi,j}C(P, Q_\phi) = \sum_{i\neq j} \left\{ p_{i,j} \log \frac{p_{i,j}}{q_{i,j}} + (1-p_{i,j}) \log \frac{1-p_{i,j}}{1-q_{i,j}} \right\} By embedding data via a learned map hθ1h_{\theta_1}, UMAP regularization enforces geometric preservation in intermediate representations. UMAP Mixup leverages this property, executing Mixup in embedding space to restrict interpolations to locations supported by the underlying data geometry.

2. Methodology and Algorithm

The UMAP Mixup approach parameterizes the predictor as fθ(x)=gθ2(hθ1(x))f_\theta(x) = g_{\theta_2}(h_{\theta_1}(x)), with hθ1:RdxRdzh_{\theta_1}: \mathbb{R}^{d_x} \rightarrow \mathbb{R}^{d_z} serving as the UMAP-inspired embedding layer and gθ2:RdzYg_{\theta_2}: \mathbb{R}^{d_z} \rightarrow \mathcal{Y} the regressor.

Single iteration workflow:

  1. Sample a positive edge (i,j)(i, j) from the UMAP data graph PP (probability pi,jp_{i,j}).
  2. Draw λBeta(α,α)\lambda \sim \text{Beta}(\alpha, \alpha).
  3. Compute embeddings zi=hθ1(xi)z_i = h_{\theta_1}(x_i), zj=hθ1(xj)z_j = h_{\theta_1}(x_j).
  4. Interpolate: z~=λzi+(1λ)zj\tilde z = \lambda z_i + (1-\lambda) z_j.
  5. Predict: y~=gθ2(z~)\tilde y = g_{\theta_2}(\tilde z).
  6. Target: ymix=λyi+(1λ)yjy_{\text{mix}} = \lambda y_i + (1-\lambda) y_j.

The batchwise Mixup loss is θmix=(y~,ymix)\ell_\theta^{\text{mix}} = \ell(\tilde y, y_{\text{mix}}), typically with \ell as mean-squared error. The UMAP regularizer is incorporated: θ^=argminθ  E(i,j)P,  λ[θmix]+γC(P,Qθ1)\widehat{\theta} = \arg\min_{\theta}\;\mathbb{E}_{(i,j)\sim P,\;\lambda}\left[\ell_\theta^{\text{mix}}\right] + \gamma\,C\left(P, Q_{\theta_1}\right) In mini-batch training, edges Eb+E_b^+ and negatives EbE_b^- are sampled, yielding the practical estimates

C^=1Eb++Eb{ei,jEb+logpi,jqi,j+ek,Eblog1pk,1qk,}\widehat{C} = \frac{1}{|E_b^+| + |E_b^-|}\left\{ \sum_{e_{i,j}\in E_b^+}\log\frac{p_{i,j}}{q_{i,j}} + \sum_{e_{k,\ell}\in E_b^-}\log\frac{1-p_{k,\ell}}{1-q_{k,\ell}} \right\}

E^[mix]=1Eb+(i,j)Eb+(gθ2(λzi+(1λ)zj),λyi+(1λ)yj)\widehat{\mathbb{E}}\left[\ell^{\rm mix}\right]=\frac{1}{|E_b^+|} \sum_{(i,j)\in E_b^+} \ell\left( g_{\theta_2}(\lambda z_i + (1-\lambda)z_j), \lambda y_i + (1-\lambda)y_j \right)

Algorithmic skeleton:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Input: D={(x_i, y_i)}, UMAP params, Mixup α, reg weight γ
Initialize θ, θ
Precompute or update edge-probs p_{i,j} (via UMAP)
for epoch:
    Sample minibatch of positive edges E and negatives E¯
    Compute UMAP regularizer Ĉ(P, Q_{θ}) over EE¯
    For each (i,j) in E:
        λ  Beta(α, α)
        z_i  h_{θ}(x_i), z_j  h_{θ}(x_j)
        z̃  λ z_i + (1λ) z_j
        ỹ  g_{θ}(z̃), y_mix  λ y_i + (1λ) y_j
        L_mix += ℓ(ỹ, y_mix)
    Total loss L  (L_mix/|E|) + γ·Ĉ
    θ  θ  η _θ L
until convergence

3. Theoretical Rationale

UMAP Mixup regularizes the model under the vicinal risk minimization (VRM) framework, specifically addressing limitations where standard Mixup may interpolate outside the high-density region of data. By forcing hθ1h_{\theta_1} through UMAP regularization, the latent ZZ-space preserves local neighborhoods and global topology, contingent on data satisfying mild manifold assumptions.

Mixup in ZZ-space thus approximates interpolation on a "coordinate chart" closely tracking the data manifold, minimizing selection of off-manifold synthetic samples. This constraint aids in preventing the model from fitting spurious modes and enhances generalization. VRM theory supports the view that restricting synthetic sample generation to plausible regions tightens generalization error bounds. Empirical observations corroborate improvement of model performance in settings where the manifold assumption is satisfied (Tabular UCI and time-series forecasting tasks) (El-Laham et al., 2023).

4. Experimental Evaluation

UMAP Mixup was evaluated across multiple regression benchmarks:

  • Tabular UCI regression: Boston Housing (13 → house price), Concrete compressive strength (8 → strength), Yacht hydrodynamics (6 → residual resistance). Model: 2-layer MLP, Adam optimizer.
  • Time-Series Forecasting: One-step forecasting with 60-day lookback using LSTM; datasets include GOOG (stable regime), RCL (distributional shift), GME (high-volatility).

The principal evaluation metric was RMSE (Root-Mean-Squared Error) on held-out folds, with the following comparative summary:

Dataset ERM Mixup Manifold Mixup UMAP Mixup
Boston Housing 3.14 ± 0.67 3.01 ± 0.71 3.10 ± 0.76 3.27 ± 0.66
Concrete 5.11 ± 0.59 5.92 ± 0.55 5.08 ± 0.62 4.83 ± 0.79
Yacht 0.91 ± 0.34 4.19 ± 0.63 0.80 ± 0.24 0.71 ± 0.21
GOOG 2.47 ± 0.05 2.47 ± 0.03 2.50 ± 0.03 2.43 ± 0.04
RCL 4.74 ± 0.69 4.07 ± 0.43 4.30 ± 0.60 3.13 ± 0.61
GME 3.66 ± 0.33 2.77 ± 0.49 3.83 ± 0.47 2.73 ± 0.37

UMAP Mixup achieves best or near-best test RMSE in the majority of experiments, with the most pronounced gains under regime shifts or distributional perturbations (e.g., RCL and GME time series).

5. Hyperparameter Configuration

  • Mixup parameter α\alpha: Optimized by cross-validation, with α=2\alpha = 2 delivering robust performance.
  • UMAP regularizer weight γ\gamma: Cross-validated; γ=0.1\gamma = 0.1 used by default.
  • UMAP graph construction:
    • Number of neighbors (KK): 15–50, selected by grid search.
    • Metric: Euclidean; min_dist = 0.1 (default).
    • Graph edges are computed prior to training and sampled in mini-batches.

Default configuration values are α=2\alpha=2, γ=0.1\gamma=0.1, K=15K=15, and latent dimension dz1020d_z \approx 10\text{–}20.

6. Practical and Implementation Considerations

UMAP Mixup is well-suited for data believed to inhabit a low-dimensional manifold, including tabular, time series, and medical signal modalities. The primary computational cost arises from maintaining and sampling UMAP edges and regularizer computation, which with mini-batch sampling is O(N)\mathcal{O}(N) per epoch and compatible with GPU acceleration.

Under computational constraints, the method is amenable to warm-starting from a pretrained UMAP embedding or downscheduling UMAP regularization (e.g., applying every 2–5 steps). Visualization of the latent ZZ-space (e.g., t-SNE) is recommended for confirmation that local neighborhoods are preserved and Mixup paths lie within high-density regions.

UMAP Mixup operationalizes the principle of "on-manifold" Mixup through explicit topological regularization in embedding space, synthesizing training examples that adhere more closely to the data’s intrinsic geometry and, in practice, yielding improved generalization in non-visual regression contexts (El-Laham et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to UMAP Mixup.