Probabilistic PROTES Method
- The Probabilistic PROTES method is a black-box optimization framework that leverages tensor-train representations to efficiently explore vast discrete spaces.
- It models the search distribution using low-parametric TT formats, enabling tractable sampling and gradient updates without exhaustively enumerating the combinatorial grid.
- Empirical results show that PROTES outperforms classical discrete optimizers on challenging benchmarks like QUBO and binary control problems, effectively reducing the curse of dimensionality.
Probabilistic PROTES Method
The Probabilistic PROTES method (PROTES: Probabilistic Optimization with Tensor Sampling) is a black-box optimization approach targeting extremely high-dimensional discrete spaces by leveraging probabilistic sampling from low-parametric tensor-train (TT) representations. PROTES is specifically designed for minimizing an objective function defined on a Cartesian product grid, efficiently handling settings with up to candidates, such as binary optimization and discretized control problems. The core innovation is expressing and manipulating the search distribution in TT format to bypass the curse of dimensionality, enabling effective exploration and exploitation in combinatorial and control domains (Batsheva et al., 2023).
1. Optimization Problem and Motivation
PROTES addresses the black-box minimization problem: where is an expensive black-box function, and the search space forms a -dimensional grid of size . As this product explodes combinatorially for large or , brute-force search is infeasible. Existing heuristics such as evolutionary algorithms, PSO, or CMA-ES become ineffective or inapplicable due to the extreme dimensionality and discrete structure. PROTES overcomes these limitations by parametrizing an adaptable probability distribution via a compact TT decomposition, enabling tractable sampling and distribution updates even when is astronomically large (Batsheva et al., 2023).
2. Tensor-Train Representation of Discrete Distributions
The search distribution is modeled in the TT format: where each core tensor controls the th dimension, and are TT ranks with . The number of parameters grows only as for uniform ranks . This format enables efficient storage and scalable manipulation of , which otherwise would be intractable for large . In practice, sampling and updates operate with this TT factorization, sidestepping explicit enumeration over hyper-exponential cardinality (Batsheva et al., 2023).
3. Sampling and Update Algorithm
PROTES repeatedly draws samples from the current TT-modeled distribution, using a sequential conditional algorithm adapted from Dolgov & Savostyanov (2020):
- Forward/backward message computation: For each TT-core, partial contraction ("messages") (forward) and (backward) are evaluated to obtain marginals and conditional probabilities for efficient sampling of each coordinate sequentially.
- Sequential sampling: Each is sampled conditional on the previously chosen coordinates, with explicit distributions computed from the TT structure and current , values.
- Batch sampling: independent samples are generated per iteration. The computational cost is , with and the cost of categorical sampling over values.
After batch evaluation,
- The top- sample indices with lowest are selected as the elite set.
- The TT parameters are updated via steps of Adam (or any gradient optimizer) on the loss:
This corresponds to a REINFORCE-style policy gradient, weighted by elite selection (Batsheva et al., 2023).
Complete Iterative Scheme (Pseudocode)
1 2 3 4 5 6 7 8 |
Initialize TT cores G_1...G_d randomly in (0,1)
Repeat (until evaluation budget exhausted):
1. Draw K samples from TT(G_1...G_d)
2. Evaluate f(x^{(j)}) for all samples
3. Select indices of k smallest f(x^{(j)})
4. If min f(x^{(j)}) improves best, record
5. Update G_1...G_d via Adam gradient ascent of -sum_{top-k} log P(x^{(j)})
Return best found x^* and f(x^*) |
4. Computational Complexity and Scalability
Each PROTES iteration consists of:
- Sampling:
- Gradient steps: per step, per iteration
- Total cost over function evaluations:
This scaling is essentially linear in dimension (assuming fixed ), and polynomial in , a dramatic reduction from exponential growth in naive discrete optimization. For moderate to large , PROTES remains computationally tractable where other discrete optimizers are not (Batsheva et al., 2023).
5. Theoretical Foundations and Relation to Policy-Gradient Methods
The update rule of PROTES can be derived from maximizing the expected reward , with a Fermi–Dirac (sharpened) function,
In the zero-temperature limit (), selects only samples close to the empirical minimum, yielding the empirical top- aggregation. The gradient update thus becomes a hard-selection analogue of the REINFORCE estimator, concentrating probability mass on promising regions while maintaining exploration. This perspective clarifies why the TT-parameterized search distribution is suitable for black-box optimization with no derivative information about (Batsheva et al., 2023).
6. Empirical Results and Performance
In comprehensive experiments, PROTES was evaluated on:
- Analytic 7D benchmark functions (Ackley, Rastrigin, Schwefel) on grids up to
- Four $50$-bit QUBO instances (Max-Cut, Vertex Cover, Knapsack)
- Binary optimal control problems for (search space up to )
- Constrained binary control (e.g., "at least three ones," encoded via an indicator TT)
With hyperparameters , , , , , , PROTES found the minimal known value in $19$ out of $20$ cases, consistently outperforming both TT-based (TTOpt, Optima-TT) and classical discrete optimizers in the nevergrad suite (PSO, CMA-ES, Differential Evolution, NoisyBandit, Portfolio). Convergence was typically faster in terms of $f_\min$ versus number of objective evaluations (Batsheva et al., 2023).
7. Strengths, Limitations, and Application Domains
Strengths:
- Bypasses the curse of dimensionality for large discrete domains by TT factorization
- No need for objective gradients; only black-box evaluations required
- Structured constraints (e.g., combinatorial restrictions) easily incorporated by modifying initial TT support
- Demonstrated robust performance on combinatorial, control, and synthetic problems
Limitations:
- TT-rank , sample size , and elite size may need tuning per problem
- Sampling and autodifferentiation through TT is nontrivial for very large and/or ; more advanced manifold optimization (e.g., Riemannian methods) may be preferable for
- The method is currently more expensive per iteration than simpler heuristics if or grow large
Applications:
PROTES is applicable to high-dimensional combinatorial optimization (QUBO, graph partitioning), black-box parameter tuning in machine learning, discrete control, resource allocation, and any setting with Cartesian product structure and latent low-rank solution geometry (Batsheva et al., 2023).