Locality Preserving Loss in Representation Learning

Updated 28 January 2026

Locality Preserving Loss (LPL) is a regularization approach that preserves local geometric and affinity structures by leveraging graph Laplacian formulations.
It extends to linear mappings, kernel methods, and deep learning architectures, enhancing latent space representations and unsupervised clustering performance.
Empirical studies show that LPL improves manifold alignment and topological consistency, making it valuable for both dimensionality reduction and embedding alignment.

Locality Preserving Loss (LPL) refers to a class of regularization and objective functions that explicitly promote the preservation of local geometric or affinity structure in mapping, embedding, and representation learning tasks. LPL has its earliest mathematical foundation in the graph Laplacian-based methods such as Laplacian Eigenmaps and Locality Preserving Projection (LPP), but has since been adapted to suit a variety of deep learning and manifold learning contexts, including deep autoencoders, variational autoencoders, cross-manifold alignment, and representation learning in high-dimensional data.

1. Foundational Formulation: Graph-Based Locality Preserving Loss

The original instantiation of LPL arises from Laplacian-based dimensionality reduction and spectral learning frameworks. Given a dataset $\{x_1, \ldots, x_n\} \subset \mathbb{R}^d$ , a sparse nearest-neighbor graph (constructed via $\epsilon$ -ball or $k$ -NN) with adjacency (affinity) matrix $S \in \mathbb{R}^{n \times n}$ is built, where a prototypical choice is $S_{ij} = \exp \left( -\|x_i - x_j\|^2 / 2\sigma^2 \right)$ if $j$ is a neighbor of $i$ , $0$ otherwise. The corresponding degree matrix $D$ and unnormalized Laplacian $L = D - S$ are then defined, with $L$ positive semi-definite and rows summing to zero.

The Locality Preserving Loss is

$\mathcal{L}_{\mathrm{LPL}} = \sum_{i=1}^n \sum_{j=1}^n S_{ij} \| y_i - y_j \|^2 = \operatorname{tr}(Y^\top L Y)$

where $y_i \in \mathbb{R}^p$ are the low-dimensional codes and $Y$ is the $n \times p$ matrix stacking them. Minimization is subject to a normalization constraint ( $Y^\top Y = I_p$ or $Y^\top D Y = I_p$ ) to avoid the trivial solution. The solution is equivalently the Laplacian eigenmap—embedding the data according to the $p$ nontrivial smallest eigenvectors of $L$ (Ghojogh et al., 2021).

2. Linear and Kernel Extensions: Locality Preserving Projection

LPL extends directly to linear mappings, yielding Locality Preserving Projection (LPP):

Linear case: $y_i = U^\top x_i$ , $U \in \mathbb{R}^{d \times p}$ , and $Y = U^\top X$ . The loss becomes $\operatorname{tr}(U^\top X L X^\top U)$ , subject to $U^\top X D X^\top U = I_p$ . The solution is given by the generalized eigenproblem $(X L X^\top) U = (X D X^\top) U \Lambda$ .
Kernel case: With a feature map $\Phi: \mathbb{R}^d \to \mathcal{H}$ and Gram matrix $K = \Phi(X)^\top \Phi(X)$ , one solves $(K L K) \Theta = (K D K) \Theta \Lambda$ for $\Theta$ , with the embedding $Y = \Theta^\top K$ (Ghojogh et al., 2021).

Out-of-sample extensions differ: linear LPP allows $y(x) = U^\top x$ for new points, kernel LPP computes $y(x) = \Theta^\top k_t$ for $k_t = [k(x_i, x)]_i$ .

3. Locality Preserving Loss in Deep Learning and Autoencoders

Recent methods generalize LPL to deep representation learning, integrating it with autoencoder frameworks:

In (Chen et al., 2019), LPL is defined as

$L_{\mathrm{locality}} = \sum_{i=1}^n \sum_{j=1}^n \| z_i - z_j \|^2 a_{ij}$

where $z_i$ are latent encodings and $a_{ij}$ are affinities constructed from pretrained (autoencoder) latents. The prior affinity matrix $\tilde{A}$ is built per-column by minimizing $\sum_i \| \tilde{z}_i - \tilde{z}_j \|^2 a_{ij} + \lambda \sum_i a_{ij}^2$ subject to $a_{ij} \ge 0, \sum_i a_{ij} = 1$ , yielding a sparse $k$ -NN structure. LPL is incorporated into the end-to-end fine-tuning loss as

$L(\Theta_e, W, \Theta_d) = L_{\mathrm{reconstruction}} + \alpha L_{\mathrm{affinity}} + \gamma L_{\mathrm{locality}}$

with $\gamma$ weighting LPL. Empirical ablations demonstrate substantially improved unsupervised clustering performance (ACC gains of 10–15%) with LPL inclusion.

In (Chen et al., 2022), LPL is formulated via a continuous $k$ -NN graph (CkNN), considering both data- and latent-space graphs. The loss:

$L_{\rm LPL}(\phi;X_b) = \sum_{i < j} (W^X_{ij} + W^Z_{ij}) [ d_X(x_i, x_j) - \gamma d_Z(z_i, z_j) ]^2$

where $W^X$ and $W^Z$ are adjacency matrices on data and latent spaces, and $\gamma$ is a learned scaling parameter. The algorithm treats LPL as the primary objective, with reconstruction as a constraint, and extends to hierarchical VAEs.

4. Locality Preserving Loss in Embedding Alignment

(Ganesan et al., 2020) introduces an LPL for supervised or semi-supervised alignment of vector space manifolds (e.g., cross-lingual embeddings):

For source embeddings $M^s$ and target $M^t$ , with paired anchors $V^p$ , $f_{\theta}$ is trained to minimize alignment MSE and

$\mathcal{L}_{\mathrm{lpl}}(\theta, W) = \sum_{(m^s_i, m^t_i) \in V^p} \left\| f(m^s_i;\theta) - \sum_{m^s_j \in N_k(m^s_i)} W_{ij} f(m^s_j;\theta) \right\|^2$

where $W_{ij}$ are locally linear reconstruction weights (from Locally Linear Embedding) for $m^s_i$ from its neighbors. The total objective combines MSE (for alignment), LPL (for locality preservation), LLE (for learning $W$ ), and an orthogonality regularizer (for stability in linear mappings). LPL is empirically shown to improve alignment, particularly under limited supervision, by increasing effective training sample utilization and providing graph Laplacian-like smoothness regularization.

5. Graph Construction: Affinity and Topology Preservation

Across all applications, the construction of the affinity/adjacency structure—whether via classic $k$ -NN, heat kernel, or CkNN—is fundamental:

Paper	Graph Construction	Affinity Matrix
(Ghojogh et al., 2021)	$\epsilon$ -ball or $k$ -NN	$S_{ij} = \exp \left( -\\|x_i-x_j\\|^2/2\sigma^2 \right)$ or $\{0,1\}$
(Chen et al., 2019)	$k$ -NN on pretrained latents	$a_{ij}$ via local quadratic program, $\sum_i a_{ij} = 1$
(Chen et al., 2022)	CkNN (density-adaptive $k$ -NN)	$W_{ij} = 1$ iff $d^2(x_i,x_j) \leq \delta^2 r_i r_j$

The CkNN affords spectral convergence to the Laplace–Beltrami operator, ensuring that the induced graph accurately reflects the intrinsic topology of the underlying data manifold, including homological features such as connected components and cycles (Chen et al., 2022).

6. Theoretical Motivation and Guarantees

The theoretical grounding of LPL is rooted in spectral graph theory and manifold learning:

The LPL objective is equivalent to minimizing a quadratic form in the graph Laplacian, penalizing separating neighbors in the embedding space.
In deep learning extensions, LPL acts as a regularizer that aligns the learned manifold structure with a precomputed local geometry, or ensures that encoder/decoder mappings do not collapse or distort local metric neighborhoods.
In the CkNN setting, the adjacency graph is guaranteed (in large-sample limits) to yield a Laplacian converging to the manifold's Laplace–Beltrami operator, underpinning homological/topological consistency (Chen et al., 2022).
In alignment contexts, LPL effectively expands the annotated training set by manifold-based interpolation, reducing overfitting and encouraging locally smooth mappings (Ganesan et al., 2020).

7. Practical Implementation and Empirical Impact

Algorithmic strategies differ per application:

Laplacian eigenmaps and LPP involve solving (generalized) eigenproblems of $O(n^3)$ or $O(d^3)$ cost, but can be handled efficiently for sparse/Laplacian matrices (Ghojogh et al., 2021).
Deep autoencoder training with LPL integrates local graph construction (potentially in minibatch), gradient-based optimization, and, in CkNN, adaptive neighborhood thresholds (Chen et al., 2022).
Affinity matrices may be fixed (built from pretrained representations) or dynamic (rebuilt per iteration/batch).
Hyperparameters such as $k$ (neighborhood size), $\sigma$ (kernel width), $\gamma$ (relative LPL weight), and $\delta$ (CkNN scale) are routinely cross-validated; improper choices can induce graph disconnectivity or wash out locality (Ghojogh et al., 2021).

Empirically, LPL consistently improves the preservation of local geometric structure in the latent space, as assessed by trustworthiness, continuity, MRRE, clustering accuracy, or alignment benchmarks, with pronounced gains in data-scarce or high-complexity regimes (Chen et al., 2019, Ganesan et al., 2020, Chen et al., 2022).

References

"Laplacian-Based Dimensionality Reduction Including Spectral Clustering, Laplacian Eigenmap, Locality Preserving Projection, Graph Embedding, and Diffusion Map: Tutorial and Survey" (Ghojogh et al., 2021).
"Generative approach to unsupervised deep local learning" (Chen et al., 2019).
"Locality Preserving Loss: Neighbors that Live together, Align together" (Ganesan et al., 2020).
"Local Distance Preserving Auto-encoders using Continuous k-Nearest Neighbours Graphs" (Chen et al., 2022).