FMQA: Factorization Machine with Quadratic Annealing

Updated 12 January 2026

The paper introduces FMQA, a framework that uses a trained factorization machine as a quadratic surrogate (QUBO) to efficiently reduce expensive objective evaluations.
It combines gradient-trained FM regression with annealing optimizers, such as quantum and simulated annealers, to solve complex combinatorial and binary variable problems.
FMQA demonstrates significant improvements in applications like materials discovery, genetic studies, and nanophotonic design by reducing computational costs and accelerating convergence.

Factorization Machine with Quadratic Optimization Annealing (FMQA) is a surrogate-model-based black-box optimization framework integrating quadratic factorization machine (FM) regression with quantum or classical Ising machine optimization. The central principle is to use a trained FM as a quadratic surrogate (QUBO) for an expensive objective, then rapidly propose low-cost candidates using annealing to optimize the surrogate. FMQA is well-suited to combinatorial, binary, integer, and discrete variables, is scalable to high dimensions, and underpins a wide range of applications in black-box optimization across physics, chemistry, biology, and engineering (Tamura et al., 24 Jul 2025).

1. Mathematical Framework and Surrogate Construction

The FMQA method is anchored on the second-order factorization machine, parameterized as follows for binary inputs $z \in \{0,1\}^N$ :

$f_{\mathrm{FM}}(z) = w_0 + \sum_{i=1}^N w_i z_i + \sum_{1 \leq i < j \leq N} \langle v_i, v_j \rangle z_i z_j$

with $v_i \in \mathbb{R}^K$ as latent factor vectors ( $K \ll N$ ), $w_0$ the bias, $w_i$ the linear weights, and $\langle v_i, v_j \rangle = \sum_{k=1}^K v_{i,k} v_{j,k}$ as factorized quadratic couplings. The FM is fitted to a dataset $D = \{(z^{(m)}, y^{(m)})\}_{m=1}^M$ by minimizing squared error or a regularized loss (Tamura et al., 24 Jul 2025, Couzinie et al., 2024, Nakano et al., 28 Jul 2025):

$L(w_0, w, V) = \frac{1}{M} \sum_{m=1}^M (f_{\mathrm{FM}}(z^{(m)}) - y^{(m)})^2$

Hyperparameters include the FM rank $K$ , regularization coefficients, and learning rate for the gradient-based optimizer.

After training, the surrogate output is in QUBO (quadratic unconstrained binary optimization) form:

$H_{\mathrm{QUBO}}(z) = c + \sum_{i=1}^N Q_{ii} z_i + \sum_{1 \leq i < j \leq N} Q_{ij} z_i z_j$

where $Q_{ii} = w_i$ , $Q_{ij} = \langle v_i, v_j \rangle$ , and constant $c = w_0$ (typically omitted in the QUBO for optimization) (Tamura et al., 24 Jul 2025).

2. QUBO Mapping and Annealing Optimization

Optimization is performed over the FM surrogate by solving the QUBO problem:

$z^* = \arg\min_{z \in \{0,1\}^N} H_{\mathrm{QUBO}}(z)$

using either physical quantum annealers (e.g., D-Wave), simulated annealing, or digital Ising machines (Tamura et al., 24 Jul 2025, Nakano et al., 28 Jul 2025, Couzinie et al., 2024). In the case where $z$ encodes constraint structure (such as fixed Hamming weight or one-hot blocks), quadratic penalty terms of the form $\lambda (\sum_i z_i - K)^2$ or $\alpha_{\text{uh}} \sum_{v} (\sum_{i \in B_v} z_i - 1)^2$ are added to $H_{\mathrm{QUBO}}(z)$ (Endo et al., 2024, Kikuchi et al., 5 Jan 2026). For quantum annealing, the QUBO is mapped to the Ising Hamiltonian through a change of variables $s_i = 2z_i - 1$ , giving $H_{\text{Ising}}(s) = \sum_i h_i s_i + \sum_{i < j} J_{ij} s_i s_j$ (Tamura et al., 24 Jul 2025, Wang et al., 2 Jul 2025).

The solver typically performs many sampling runs with progressively decreasing temperature (classical or quantum schedule), each proposing candidate $z$ minimizing the surrogate. Constraints are either strictly enforced (hard penalty terms, feasible region encoding) or softly imposed (Couzinie et al., 2024, Kikuchi et al., 5 Jan 2026).

3. Iterative Optimization Loop and Data Management

The FMQA procedure is iterative, combining surrogate updates with annealing-based candidate proposals and true function evaluations. A typical workflow is as follows (Tamura et al., 24 Jul 2025, Nakano et al., 28 Jul 2025, Kikuchi et al., 5 Jan 2026):

Initialization: Generate an initial set of random valid samples $\{z^{(m)}\}$ , evaluate $y^{(m)} = f(z^{(m)})$ , and construct initial dataset $D$ .
Training: Fit FM parameters to $D$ .
QUBO Extraction: Form updated QUBO matrix from FM coefficients.
Annealing Step: Solve QUBO (Ising/annealer) to generate low-cost candidate solutions.
Evaluation & Data Update: Evaluate the black-box function at new candidates; augment $D$ with $(z, y)$ pairs.
Dataset Management: Optionally, restrict $D$ to the $M_{\text{latest}}$ most recent samples to avoid dilution of new information—a limited-memory FIFO update improves convergence speed versus training on all data (Nakano et al., 28 Jul 2025).
Convergence Check: Continue until a stopping criterion is met (maximum iterations, stagnation, or target attained).

This loop is summarized in the following schematic table:

Step	Action	Computational Aspect
Initialization	Generate, evaluate initial points	$M_0$ black-box calls
Training	FM fit via gradient-based optimizer	$O(\|D\| N K)$ per epoch
QUBO Extraction	Read-off quadratic coefficients	$O(N^2)$
Annealing	Optimize surrogate on Ising/annealer	Annealing time per solve
Evaluation	Evaluate black-box at new solutions	Costly, main bottleneck
Data Update	Add new data (optionally FIFO/truncate)	$O(1)$ per step

By carefully managing the dataset size, FMQA avoids stagnation arising from the diminishing influence of new data on the surrogate’s loss landscape (Nakano et al., 28 Jul 2025).

4. Extensions: Initialization, Smoothing, and Higher-Order Interactions

Low-Rank Warm-Start Initialization

Systematic initialization of the FM parameters via low-rank approximation of a known or approximated Ising model dramatically improves early convergence. Given symmetric coupling matrix $J$ , the procedure shifts $J \to J' = J - \lambda_N I$ (where $\lambda_N$ is the smallest eigenvalue), truncates to rank $K$ , and encodes $J' \approx U_K \Sigma'_K U_K^T$ by assigning rows of $(\Sigma'_K)^{1/2} U_K^T$ to FM's factor vectors $v_i$ . Random-matrix theory predicts the required FM rank $K^*$ for target accuracy, and this method yields a surrogate whose error rapidly vanishes as $K \to K^*$ (Seki et al., 2024).

Function Smoothing for Continuous/Binary Encodings

When continuous variables are discretized to binary via one-hot, FM training can leave “dead” features (bits unactivated in data) with random coefficients, introducing significant noise in the surrogate surface. Adding a function-smoothing regularizer penalizing differences in $w_i$ and $v_i$ between neighboring bits restores smoothness and regularizes the surrogate:

$L' = L + \lambda_{\mathrm{SR}} \sum_{(p, q) \in \mathcal{A}} \left( \| v_p - v_q \|^2 + (w_p - w_q)^2 \right)$

This dramatically improves sample efficiency, especially in low-data regimes for parameter-fitting tasks (Endo et al., 2024).

Higher-Order Interactions via Slack Variables

To efficiently capture higher-order (beyond quadratic) interactions, FMQA can be extended with slack variables: $z = [x; s] \in \{0,1\}^{n+m}$ . These variables, introduced as input features and learned via annealing, enhance the FM surrogate’s expressiveness, effectively approximating higher-order terms while keeping optimization quadratic. Empirical results confirm improved surrogate accuracy and data efficiency for moderate $m$ (Wang et al., 2 Jul 2025).

5. Applications in Scientific Computing and Engineering

FMQA supports optimization in domains with expensive, black-box objective functions and high-dimensional, combinatorial search spaces:

Materials discovery and crystal structure prediction: Encodes atomic arrangements as binaries; efficiently samples ground and metastable states for Lennard-Jones, Stillinger–Weber, and EAM potentials with sub-linear numbers of energy evaluations. Empirically, FMQA finds ground states in fewer than 2–20 iterations where brute-force search would have orders-of-magnitude higher cost. Ranking of local minima by FM surrogate’s energy correlates well with true energetics (Kendall $\tau$ often $> 0.7$ ) (Couzinie et al., 2024).
Combinatorial genetic studies: High-order epistasis detection is recast as subset selection under MDR-based CER evaluation. FMQA identifies true epistatic sets with up to $N=1000, d=5$ loci (combinatorial search spaces $>10^{12}$ ) using only $O(10^2$ – $10^3)$ expensive MDR calls, a reduction of many orders of magnitude compared to exhaustive search (Kikuchi et al., 5 Jan 2026).
Nanophotonic and metasurface design: A binary VAE maps continuous designs to latent codes; FM surrogate is trained over latent space, rapidly proposing out-of-sample device designs outperforming training-set bests for efficiency and diffraction—gain of several pp after 30–50 iterations (Wilson et al., 2021).
Drug combination effect prediction: Augmented FMQA with slack variables achieves higher accuracy and lower variance in high-sparsity dose-response matrix reconstruction and pair prediction (Wang et al., 2 Jul 2025).
Hyperparameter tuning, circuit design, traffic signal optimization, molecule and peptide generation: FMQA’s applicability is broad, covering discrete or binary-encoded scientific and engineering optimization (Tamura et al., 24 Jul 2025).

Key performance metrics across domains include success rate for ground-truth identification, mean number of expensive evaluations required, solution ranking accuracy (Spearman/Kendall), and convergence to known minima.

6. Algorithmic Variants, Complexity, and Solver Landscape

The core FMQA loop is agnostic to the underlying annealing solver: quantum annealers, simulated annealing, digital annealers, and classical Ising machines are all supported. Solver-specific workflows involve:

QUBO assembly: $O(N^2)$ time to build from trained FM.
Annealing/sampling: Hardware-dependent; D-Wave/Ocean samplers, Fixstars Amplify AE, Fujitsu Digital Annealer are typical (Tamura et al., 24 Jul 2025).
FM training: $O(M N K)$ per epoch; efficiency scales with latent dimension $K$ and dataset size.
Data handling: Memory-limited updates ( $M_{\text{latest}}$ samples) improve convergence speed and reduce stagnation (Nakano et al., 28 Jul 2025).

Python packages such as fmqa, Amplify SDK (Amplify-BBOpt), D-Wave’s dimod, and PyTorch-based FM libraries are available for practical implementation; these enable domain experts to encode variables, initialize the loop, and interface with various annealing engines (Tamura et al., 24 Jul 2025).

7. Limitations and Open Directions

While FMQA achieves notable data efficiency and scales to very large binary variable spaces (up to hundreds of thousands of bits), several challenges remain:

Surrogate accuracy: The ranking and smoothness of the FM’s QUBO can be sensitive to training data coverage and rank $K$ ; smoothing regularization and low-rank initialization partially mitigate these effects (Seki et al., 2024, Endo et al., 2024). Dead features in high-dimensional or sparse encodings can still degrade surrogate quality.
Higher-order interactions: Standard FM models only quadratic (pairwise) feature interactions. The addition of slack bits enables compact approximate higher-order modeling, at the expense of increased QUBO size and mixed-integer complexity (Wang et al., 2 Jul 2025).
Annealer and solver limits: Physical quantum annealers and digital Ising machines have finite embedding capacities, imposing upper bounds on $N$ ; solution quality can depend on chain lengths (minor embedding), noise, and annealing schedule specifics.
Convergence guarantees: The nonconvexity of the surrogate landscape and finite QUBO solver fidelity can lead to suboptimal convergence; no formal convergence guarantees are provided for arbitrary black-box functions.

Continued advances are anticipated by integrating more powerful surrogate models (e.g., hybrid deep-factored FM architectures), further optimizing data selection and regularization, and scaling up hardware and constrained sampler capabilities. Extensions are likely in domains requiring efficient black-box optimization under combinatorial constraints, such as quantum device layout, network design, and synthetic biology (Tamura et al., 24 Jul 2025, Kikuchi et al., 5 Jan 2026).