Probabilistic PROTES Method

Updated 2 February 2026

The Probabilistic PROTES method is a black-box optimization framework that leverages tensor-train representations to efficiently explore vast discrete spaces.
It models the search distribution using low-parametric TT formats, enabling tractable sampling and gradient updates without exhaustively enumerating the combinatorial grid.
Empirical results show that PROTES outperforms classical discrete optimizers on challenging benchmarks like QUBO and binary control problems, effectively reducing the curse of dimensionality.

The Probabilistic PROTES method (PROTES: Probabilistic Optimization with Tensor Sampling) is a black-box optimization approach targeting extremely high-dimensional discrete spaces by leveraging probabilistic sampling from low-parametric tensor-train (TT) representations. PROTES is specifically designed for minimizing an objective function defined on a Cartesian product grid, efficiently handling settings with up to $2^{100}$ candidates, such as binary optimization and discretized control problems. The core innovation is expressing and manipulating the search distribution in TT format to bypass the curse of dimensionality, enabling effective exploration and exploitation in combinatorial and control domains (Batsheva et al., 2023).

1. Optimization Problem and Motivation

PROTES addresses the black-box minimization problem: $\min_{x} f(x), \quad x=(n_1, ..., n_d), \quad n_i \in \{1, ..., N_i\}$ where $f$ is an expensive black-box function, and the search space forms a $d$ -dimensional grid of size $N_1 \times \cdots \times N_d$ . As this product explodes combinatorially for large $d$ or $N_i$ , brute-force search is infeasible. Existing heuristics such as evolutionary algorithms, PSO, or CMA-ES become ineffective or inapplicable due to the extreme dimensionality and discrete structure. PROTES overcomes these limitations by parametrizing an adaptable probability distribution $P(x)$ via a compact TT decomposition, enabling tractable sampling and distribution updates even when $\prod_i N_i$ is astronomically large (Batsheva et al., 2023).

2. Tensor-Train Representation of Discrete Distributions

The search distribution $P \in \mathbb{R}_+^{N_1 \times \cdots \times N_d}$ is modeled in the TT format: $P[n_1, ..., n_d] = \sum_{r_1=1}^{R_1} \cdots \sum_{r_{d-1}=1}^{R_{d-1}} G_1[1, n_1, r_1] G_2[r_1, n_2, r_2] \cdots G_d[r_{d-1}, n_d, 1]$ where each core tensor $G_k \in \mathbb{R}^{R_{k-1} \times N_k \times R_k}$ controls the $k$ th dimension, and $R_k$ are TT ranks with $R_0=R_d=1$ . The number of parameters grows only as $O(d N R^2)$ for uniform ranks $R$ . This format enables efficient storage and scalable manipulation of $P(x)$ , which otherwise would be intractable for large $d$ . In practice, sampling and updates operate with this TT factorization, sidestepping explicit enumeration over hyper-exponential cardinality (Batsheva et al., 2023).

3. Sampling and Update Algorithm

PROTES repeatedly draws samples from the current TT-modeled distribution, using a sequential conditional algorithm adapted from Dolgov & Savostyanov (2020):

Forward/backward message computation: For each TT-core, partial contraction ("messages") $\alpha_k(r_{k-1})$ (forward) and $\beta_k(r_{k-1})$ (backward) are evaluated to obtain marginals and conditional probabilities for efficient sampling of each coordinate sequentially.
Sequential sampling: Each $n_k$ is sampled conditional on the previously chosen coordinates, with explicit distributions computed from the TT structure and current $\alpha_k$ , $\beta_{k+1}$ values.
Batch sampling: $K$ independent samples $x^{(j)}$ are generated per iteration. The computational cost is $O(K\, d\,(N + R)R + K\, d\, \alpha(N))$ , with $N = \max_i N_i$ and $\alpha(N)$ the cost of categorical sampling over $N$ values.

After batch evaluation,

The top- $k$ sample indices with lowest $f(x^{(j)})$ are selected as the elite set.
The TT parameters $G_k$ are updated via $k_{gd}$ steps of Adam (or any gradient optimizer) on the loss:

$L(G) = - \sum_{j \in \mathcal{S}} \log P(x^{(j)})$

This corresponds to a REINFORCE-style policy gradient, weighted by elite selection (Batsheva et al., 2023).

Complete Iterative Scheme (Pseudocode)

Initialize TT cores G_1...G_d randomly in (0,1)
Repeat (until evaluation budget exhausted):
    1. Draw K samples from TT(G_1...G_d)
    2. Evaluate f(x^{(j)}) for all samples
    3. Select indices of k smallest f(x^{(j)})
    4. If min f(x^{(j)}) improves best, record
    5. Update G_1...G_d via Adam gradient ascent of -sum_{top-k} log P(x^{(j)})
Return best found x^* and f(x^*)

This process requires only black-box function evaluations and can enforce additional constraints by constraining the support of the initial TT cores (Batsheva et al., 2023).

4. Computational Complexity and Scalability

Each PROTES iteration consists of:

Sampling: $O(K d (N + R) R + K d \alpha(N))$
Gradient steps: $O(d R^2)$ per step, $O(k\,k_{gd}\,d\,R^2)$ per iteration
Total cost over $M$ function evaluations: $O(M\,d\,[(N+R)R + \alpha(N)] + M\,\frac{k}{K}\,k_{gd}\,d\,R^2)$

This scaling is essentially linear in dimension $d$ (assuming fixed $R$ ), and polynomial in $N, R$ , a dramatic reduction from exponential growth in naive discrete optimization. For moderate to large $d$ , PROTES remains computationally tractable where other discrete optimizers are not (Batsheva et al., 2023).

5. Theoretical Foundations and Relation to Policy-Gradient Methods

The update rule of PROTES can be derived from maximizing the expected reward $\mathbb{E}_{x \sim p_\theta}[F(f(x))]$ , with $F$ a Fermi–Dirac (sharpened) function,

$F(f) = \frac{1}{\exp((f - y_{\min} - E)/T) + 1}$

In the zero-temperature limit ( $T \to 0$ ), $F(f)$ selects only samples close to the empirical minimum, yielding the empirical top- $k$ aggregation. The gradient update thus becomes a hard-selection analogue of the REINFORCE estimator, concentrating probability mass on promising regions while maintaining exploration. This perspective clarifies why the TT-parameterized search distribution is suitable for black-box optimization with no derivative information about $f$ (Batsheva et al., 2023).

6. Empirical Results and Performance

In comprehensive experiments, PROTES was evaluated on:

Analytic 7D benchmark functions (Ackley, Rastrigin, Schwefel) on grids up to $16^7$
Four $50$-bit QUBO instances (Max-Cut, Vertex Cover, Knapsack)
Binary optimal control problems for $T=25,50,100$ (search space up to $2^{100}$ )
Constrained binary control (e.g., "at least three ones," encoded via an indicator TT)

With hyperparameters $K=100$ , $k=10$ , $k_{gd}=1$ , $\lambda=0.05$ , $R=5$ , $M=10^4$ , PROTES found the minimal known value in $19$ out of $20$ cases, consistently outperforming both TT-based (TTOpt, Optima-TT) and classical discrete optimizers in the nevergrad suite (PSO, CMA-ES, Differential Evolution, NoisyBandit, Portfolio). Convergence was typically faster in terms of $f_\min$ versus number of objective evaluations (Batsheva et al., 2023).

7. Strengths, Limitations, and Application Domains

Strengths:

Bypasses the curse of dimensionality for large discrete domains by TT factorization
No need for objective gradients; only black-box evaluations required
Structured constraints (e.g., combinatorial restrictions) easily incorporated by modifying initial TT support
Demonstrated robust performance on combinatorial, control, and synthetic problems

Limitations:

TT-rank $R$ , sample size $K$ , and elite size $k$ may need tuning per problem
Sampling and autodifferentiation through TT is nontrivial for very large $d$ and/or $R$ ; more advanced manifold optimization (e.g., Riemannian methods) may be preferable for $d \ge 1000$
The method is currently more expensive per iteration than simpler heuristics if $d$ or $R$ grow large

Applications:

PROTES is applicable to high-dimensional combinatorial optimization (QUBO, graph partitioning), black-box parameter tuning in machine learning, discrete control, resource allocation, and any setting with Cartesian product structure and latent low-rank solution geometry (Batsheva et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

PROTES: Probabilistic Optimization with Tensor Sampling (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Probabilistic PROTES Method.

Probabilistic PROTES Method

1. Optimization Problem and Motivation

2. Tensor-Train Representation of Discrete Distributions

3. Sampling and Update Algorithm

Complete Iterative Scheme (Pseudocode)

4. Computational Complexity and Scalability

5. Theoretical Foundations and Relation to Policy-Gradient Methods

6. Empirical Results and Performance

7. Strengths, Limitations, and Application Domains

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics