BPASGM: Sparse Graph Model & Best-Path Selection

Updated 10 February 2026

BPASGM is a machine learning framework that integrates sparse graphical model discovery with mutual information and regression principles for high-dimensional variable selection and portfolio construction.
It employs a best-path search algorithm to identify predictor sets by maximizing mutual information and minimizing redundancy, resulting in more interpretable models.
BPASGM enables robust portfolio optimization and variable selection by reducing model dimensions and improving risk-return metrics through extensive empirical validation.

The Best-Path Algorithm Sparse Graphical Model (BPASGM) is a class of machine learning frameworks designed for high-dimensional variable selection and portfolio construction, integrating sparse graphical model discovery with information-theoretic and regression-based principles. BPASGM extends the Best-Path Algorithm (BPA) to exploit both linear and nonlinear dependencies, constructing directed or undirected sparse graphs over candidate variables or assets, and systematically identifying maximally informative, yet minimally redundant, predictor sets. The framework is particularly suited for domains involving large numbers of variables with complex dependence structures, such as quantitative finance and high-dimensional regression, and has demonstrated strong empirical performance compared to both LASSO and precision-matrix-based selection approaches (Matteo et al., 3 Feb 2026, Riso et al., 2022, Riso, 2021).

1. Mathematical Foundations

BPASGM is rooted in probabilistic graphical models and the use of mutual information for quantifying dependence. For a variable set $X = (X_1, \dots, X_p)$ (interpreted as asset returns, covariates, or features depending on context), BPASGM encodes conditional dependencies as a sparse graph $G=(V,E)$ . In asset allocation applications, $G$ is a mixed directed graph with nodes for each asset and two types of edges: directed ( $i \rightarrow j$ ) and bi-directed ( $i \leftrightarrow j$ ), excluding undirected edges (Matteo et al., 3 Feb 2026). The key adjacency structure is a binary matrix $\Theta \in \{0,1\}^{p \times p}$ , where $\theta_{j,i} = 1$ indicates that $X_j$ is in the Best-Path predictor set of $X_i$ .

The structural Markov property is central: for each node $X_i$ , with $\mathrm{ps}_i = \{j: \theta_{j,i}=1\}$ , BPASGM enforces

$X_i \perp X \setminus \{X_i \cup \mathrm{ps}_i\} \mid \mathrm{ps}_i,$

which (approximately) means $I(X_i; X \setminus \{X_i \cup \mathrm{ps}_i\} | \mathrm{ps}_i) \approx 0$ in information-theoretic terms (Matteo et al., 3 Feb 2026, Riso et al., 2022).

Dependencies are weighted using mutual information (MI) rather than solely covariance, enabling modeling of both linear and nonlinear relationships. For the undirected variant, BPASGM constructs a maximum spanning forest on the variable set, scoring edges by penalized pairwise MI (e.g., AIC or BIC penalties) and yielding a decomposable graph structure (Riso, 2021, Riso et al., 2022).

2. Best-Path Search and Variable Set Selection

The core selection mechanism in BPASGM is the identification of Best-Path predictor sets through a radius-based search in the graph $G$ . For a target node $Y$ (or asset $X_i$ ), the method enumerates path-steps: for order $k$ ,

$w_k = \{ X_j \in V_T : \mathrm{dist}_G(Y, X_j) \leq k \},$

where $V_T$ and $\mathrm{dist}_G$ respectively denote the connected component and graph distance. The optimal path-step $w^*$ is chosen to maximize the overall MI between $Y$ and members of $w_k$ , possibly adjusted using the entropy coefficient of determination (Riso et al., 2022), or, in regression contexts, to maximize cross-validated adjusted $R^2$ (Riso, 2021).

Within each candidate set, further sparsification is typically performed. Nonsignificant predictors (e.g., with insufficient MI or low $t$ -statistics in OLS regression) are dropped (Riso et al., 2022, Riso, 2021).

3. Dependence Screening and Edge Classification

In asset selection, BPASGM introduces a specific three-step dependence-driven screening process (Matteo et al., 3 Feb 2026). Given a signed adjacency matrix $\Theta_s = \Theta \odot 1_{\Sigma>0}$ (retaining only positively correlated or redundant links), assets are iteratively screened relative to a pivot (typically, the asset with the highest Sharpe or Sortino ratio):

Direct link removal: Remove assets directly connected to the pivot via positive links.
Redundant (feedback) link removal: Remove one of each mutually connected asset pair (retaining the asset with better performance).
Closed-chain and spurious link removal: Eliminate one of each asset pair with indirect but positive linkage.

After three stages, only assets independent or negatively correlated with the pivot remain. Edge types—direct (feedback), indirect (closed-chain), and simple/spurious—are formally defined and distinguished using Boolean matrix operations and graph powers on $\Theta$ (Matteo et al., 3 Feb 2026).

4. Portfolio Optimization and Error Control

In financial applications, BPASGM reduces the universe from $p$ to $g$ assets. Portfolio weights are computed via the standard Markowitz mean-variance optimization on the selected subset ( $\hat\mu_g, \hat\Sigma_g$ ): $\min_{w\in\mathbb{R}^g} w^T \hat\Sigma_g w - \xi w^T \hat\mu_g \;\;\text{subject to}\;\; \sum_{i=1}^g w_i=1, \; w_i\geq 0$ or, equivalently, for a target return. This procedure avoids attempting to improve the theoretical optimal mean-variance portfolio but results in superior realized out-of-sample performance due to lower estimation error in $\hat\Sigma_g$ (Matteo et al., 3 Feb 2026).

Key risk metrics include realized volatility, Sharpe and Sortino ratios, and diversification ratio (DR). By discarding positively or redundantly correlated assets, BPASGM drives down the average asset correlation $\bar\rho$ , raising the DR and improving frontier stability. The reduction in dimension also ameliorates the ill-conditioning of $\hat\Sigma_g$ , enhancing robustness (Matteo et al., 3 Feb 2026).

5. Simulation and Empirical Evaluation

BPASGM has been benchmarked via Monte Carlo simulations and extensive empirical backtesting. Simulations using $p=12$ assets with prescribed dependencies show that BPASGM-selected portfolios consistently provide superior risk-return profiles versus random same-size portfolios. The empirical volatility and Sharpe metrics improve monotonically as positively coupled assets are removed, and the slope of the mean–volatility frontier stabilizes (regression $R^2$ increases from $0.04$ to $0.60$ as the final subset is selected) (Matteo et al., 3 Feb 2026).

Real-world backtesting over a large asset universe ( $p=358$ , including 333 U.S. equities, 16 indices, 9 FX rates, spanning 1990–2025) demonstrates that BPASGM reduces portfolio cardinality ( $g \approx 3$ ) while further lowering realized volatility (both in-sample and out-of-sample DCC-GARCH) below the hypothetical “independent-assets” benchmark. Time-varying correlations among selected assets are predominantly negative, and risk-adjusted returns rise across screening steps (Matteo et al., 3 Feb 2026).

For variable selection, BPASGM outperforms LASSO and Elastic Net in numerous regression benchmarks, yielding more compact variable sets, higher adjusted $R^2$ , and substantially lower mean-squared-error (MSE)—including up to 93-fold MSE reduction in the “Communities and Crime” dataset (Riso et al., 2022, Riso, 2021).

6. Algorithmic Complexity and Practical Implementation

BPASGM maintains computational tractability in high-dimensional settings. For $p$ variables/assets and $n$ samples, pairwise mutual information estimation dominates the cost— $O(p^2 n)$ for naive algorithms, with further $O(p^3)$ complexity for indirect link discovery in some applications. Algorithmic scaling benefits from sparse graph structure and supports asset-wise parallelization. Kruskal's or Prim's algorithms are used for spanning forests, and k-NN-based MI estimators allow further acceleration (Matteo et al., 3 Feb 2026, Riso et al., 2022, Riso, 2021).

Key practical recommendations include:

Precomputing pairwise MI using fast estimators for continuous or mixed data.
Storing adjacency matrices as sparse objects.
Cross-validated density estimation (for entropy-based selection), and CV folds for regression evaluation.
Parallelizing independent asset and predictor computations.
Hyperparameter settings: penalty (AIC/BIC), MI test threshold ( $\alpha$ ), performance criteria (Sharpe/Sortino), and CV fold count.
R and Python (notably gRapHD and networkx) are recommended environments (Riso et al., 2022).

7. Theoretical Context and Distinctions

Unlike LASSO, which imposes $\ell_1$ sparsity through penalized convex optimization but ignores conditional dependence structure, BPASGM incorporates graphical and information-theoretic constraints. Selection in BPASGM is interpretable in terms of conditional independence, Markov property, and information propagation in the variable graph. The framework generalizes the Chow–Liu model for tree-structured graphs and is extendible to regression, classification, and forecast model selection. It supports both marginal and conditional dependence-based variable selection—MI for arbitrary dependence, entropy coefficient for multi-variable influence, and regression coefficient significance for standard linear models (Riso et al., 2022, Riso, 2021).

BPASGM does not attempt model-centric optimality under exact data-generating assumptions; rather, it is tailored for finite-sample scenarios where estimation error and redundant dependence pose signal amplification risks. The observed empirical gains provide evidence that dependence-aware selection can substantially improve downstream interpretability and out-of-sample performance across both regression and portfolio construction domains.

References:

(Matteo et al., 3 Feb 2026, Riso et al., 2022, Riso, 2021)