Papers
Topics
Authors
Recent
Search
2000 character limit reached

Probabilistic PROTES Method

Updated 2 February 2026
  • The Probabilistic PROTES method is a black-box optimization framework that leverages tensor-train representations to efficiently explore vast discrete spaces.
  • It models the search distribution using low-parametric TT formats, enabling tractable sampling and gradient updates without exhaustively enumerating the combinatorial grid.
  • Empirical results show that PROTES outperforms classical discrete optimizers on challenging benchmarks like QUBO and binary control problems, effectively reducing the curse of dimensionality.

Probabilistic PROTES Method

The Probabilistic PROTES method (PROTES: Probabilistic Optimization with Tensor Sampling) is a black-box optimization approach targeting extremely high-dimensional discrete spaces by leveraging probabilistic sampling from low-parametric tensor-train (TT) representations. PROTES is specifically designed for minimizing an objective function defined on a Cartesian product grid, efficiently handling settings with up to 21002^{100} candidates, such as binary optimization and discretized control problems. The core innovation is expressing and manipulating the search distribution in TT format to bypass the curse of dimensionality, enabling effective exploration and exploitation in combinatorial and control domains (Batsheva et al., 2023).

1. Optimization Problem and Motivation

PROTES addresses the black-box minimization problem: minxf(x),x=(n1,...,nd),ni{1,...,Ni}\min_{x} f(x), \quad x=(n_1, ..., n_d), \quad n_i \in \{1, ..., N_i\} where ff is an expensive black-box function, and the search space forms a dd-dimensional grid of size N1××NdN_1 \times \cdots \times N_d. As this product explodes combinatorially for large dd or NiN_i, brute-force search is infeasible. Existing heuristics such as evolutionary algorithms, PSO, or CMA-ES become ineffective or inapplicable due to the extreme dimensionality and discrete structure. PROTES overcomes these limitations by parametrizing an adaptable probability distribution P(x)P(x) via a compact TT decomposition, enabling tractable sampling and distribution updates even when iNi\prod_i N_i is astronomically large (Batsheva et al., 2023).

2. Tensor-Train Representation of Discrete Distributions

The search distribution PR+N1××NdP \in \mathbb{R}_+^{N_1 \times \cdots \times N_d} is modeled in the TT format: P[n1,...,nd]=r1=1R1rd1=1Rd1G1[1,n1,r1]G2[r1,n2,r2]Gd[rd1,nd,1]P[n_1, ..., n_d] = \sum_{r_1=1}^{R_1} \cdots \sum_{r_{d-1}=1}^{R_{d-1}} G_1[1, n_1, r_1] G_2[r_1, n_2, r_2] \cdots G_d[r_{d-1}, n_d, 1] where each core tensor GkRRk1×Nk×RkG_k \in \mathbb{R}^{R_{k-1} \times N_k \times R_k} controls the kkth dimension, and RkR_k are TT ranks with R0=Rd=1R_0=R_d=1. The number of parameters grows only as O(dNR2)O(d N R^2) for uniform ranks RR. This format enables efficient storage and scalable manipulation of P(x)P(x), which otherwise would be intractable for large dd. In practice, sampling and updates operate with this TT factorization, sidestepping explicit enumeration over hyper-exponential cardinality (Batsheva et al., 2023).

3. Sampling and Update Algorithm

PROTES repeatedly draws samples from the current TT-modeled distribution, using a sequential conditional algorithm adapted from Dolgov & Savostyanov (2020):

  • Forward/backward message computation: For each TT-core, partial contraction ("messages") αk(rk1)\alpha_k(r_{k-1}) (forward) and βk(rk1)\beta_k(r_{k-1}) (backward) are evaluated to obtain marginals and conditional probabilities for efficient sampling of each coordinate sequentially.
  • Sequential sampling: Each nkn_k is sampled conditional on the previously chosen coordinates, with explicit distributions computed from the TT structure and current αk\alpha_k, βk+1\beta_{k+1} values.
  • Batch sampling: KK independent samples x(j)x^{(j)} are generated per iteration. The computational cost is O(Kd(N+R)R+Kdα(N))O(K\, d\,(N + R)R + K\, d\, \alpha(N)), with N=maxiNiN = \max_i N_i and α(N)\alpha(N) the cost of categorical sampling over NN values.

After batch evaluation,

  • The top-kk sample indices with lowest f(x(j))f(x^{(j)}) are selected as the elite set.
  • The TT parameters GkG_k are updated via kgdk_{gd} steps of Adam (or any gradient optimizer) on the loss:

L(G)=jSlogP(x(j))L(G) = - \sum_{j \in \mathcal{S}} \log P(x^{(j)})

This corresponds to a REINFORCE-style policy gradient, weighted by elite selection (Batsheva et al., 2023).

Complete Iterative Scheme (Pseudocode)

1
2
3
4
5
6
7
8
Initialize TT cores G_1...G_d randomly in (0,1)
Repeat (until evaluation budget exhausted):
    1. Draw K samples from TT(G_1...G_d)
    2. Evaluate f(x^{(j)}) for all samples
    3. Select indices of k smallest f(x^{(j)})
    4. If min f(x^{(j)}) improves best, record
    5. Update G_1...G_d via Adam gradient ascent of -sum_{top-k} log P(x^{(j)})
Return best found x^* and f(x^*)
This process requires only black-box function evaluations and can enforce additional constraints by constraining the support of the initial TT cores (Batsheva et al., 2023).

4. Computational Complexity and Scalability

Each PROTES iteration consists of:

  • Sampling: O(Kd(N+R)R+Kdα(N))O(K d (N + R) R + K d \alpha(N))
  • Gradient steps: O(dR2)O(d R^2) per step, O(kkgddR2)O(k\,k_{gd}\,d\,R^2) per iteration
  • Total cost over MM function evaluations: O(Md[(N+R)R+α(N)]+MkKkgddR2)O(M\,d\,[(N+R)R + \alpha(N)] + M\,\frac{k}{K}\,k_{gd}\,d\,R^2)

This scaling is essentially linear in dimension dd (assuming fixed RR), and polynomial in N,RN, R, a dramatic reduction from exponential growth in naive discrete optimization. For moderate to large dd, PROTES remains computationally tractable where other discrete optimizers are not (Batsheva et al., 2023).

5. Theoretical Foundations and Relation to Policy-Gradient Methods

The update rule of PROTES can be derived from maximizing the expected reward Expθ[F(f(x))]\mathbb{E}_{x \sim p_\theta}[F(f(x))], with FF a Fermi–Dirac (sharpened) function,

F(f)=1exp((fyminE)/T)+1F(f) = \frac{1}{\exp((f - y_{\min} - E)/T) + 1}

In the zero-temperature limit (T0T \to 0), F(f)F(f) selects only samples close to the empirical minimum, yielding the empirical top-kk aggregation. The gradient update thus becomes a hard-selection analogue of the REINFORCE estimator, concentrating probability mass on promising regions while maintaining exploration. This perspective clarifies why the TT-parameterized search distribution is suitable for black-box optimization with no derivative information about ff (Batsheva et al., 2023).

6. Empirical Results and Performance

In comprehensive experiments, PROTES was evaluated on:

  • Analytic 7D benchmark functions (Ackley, Rastrigin, Schwefel) on grids up to 16716^7
  • Four $50$-bit QUBO instances (Max-Cut, Vertex Cover, Knapsack)
  • Binary optimal control problems for T=25,50,100T=25,50,100 (search space up to 21002^{100})
  • Constrained binary control (e.g., "at least three ones," encoded via an indicator TT)

With hyperparameters K=100K=100, k=10k=10, kgd=1k_{gd}=1, λ=0.05\lambda=0.05, R=5R=5, M=104M=10^4, PROTES found the minimal known value in $19$ out of $20$ cases, consistently outperforming both TT-based (TTOpt, Optima-TT) and classical discrete optimizers in the nevergrad suite (PSO, CMA-ES, Differential Evolution, NoisyBandit, Portfolio). Convergence was typically faster in terms of $f_\min$ versus number of objective evaluations (Batsheva et al., 2023).

7. Strengths, Limitations, and Application Domains

Strengths:

  • Bypasses the curse of dimensionality for large discrete domains by TT factorization
  • No need for objective gradients; only black-box evaluations required
  • Structured constraints (e.g., combinatorial restrictions) easily incorporated by modifying initial TT support
  • Demonstrated robust performance on combinatorial, control, and synthetic problems

Limitations:

  • TT-rank RR, sample size KK, and elite size kk may need tuning per problem
  • Sampling and autodifferentiation through TT is nontrivial for very large dd and/or RR; more advanced manifold optimization (e.g., Riemannian methods) may be preferable for d1000d \ge 1000
  • The method is currently more expensive per iteration than simpler heuristics if dd or RR grow large

Applications:

PROTES is applicable to high-dimensional combinatorial optimization (QUBO, graph partitioning), black-box parameter tuning in machine learning, discrete control, resource allocation, and any setting with Cartesian product structure and latent low-rank solution geometry (Batsheva et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Probabilistic PROTES Method.