Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dueling-Bandit Selection: Sparse and Efficient Methods

Updated 13 February 2026
  • Dueling-bandit selection is the process of identifying optimal items through noisy pairwise duels, emphasizing relative feedback over direct rewards.
  • Algorithms employ winner criteria such as Borda, Copeland, and Condorcet, with methods like SECS exploiting sparsity to efficiently eliminate suboptimal choices.
  • Theoretical guarantees and empirical evaluations demonstrate reduced sample complexity and robust performance in both synthetic simulations and real-world ranking tasks.

Dueling-bandit selection is the problem of sequentially identifying optimal or near-optimal items from a finite set based solely on noisy pairwise comparisons, rather than absolute reward signals. This setting is fundamental in interactive learning, online ranking, and preference elicitation, where only relative (comparative) feedback is available. The field covers sample-optimal pure-exploration and regret-minimization algorithms under a variety of winner criteria, including the Borda, Copeland, and Condorcet notions, with specialized methodologies leveraging structure such as sparsity or transitivity in the underlying preference matrix.

1. Problem Formulation and Winner Criteria

Consider KK arms, with each action a pairwise "duel" (i,j)(i, j) between distinct arms, yielding a random outcome Zi,j(t){0,1}Z^{(t)}_{i,j} \in \{0,1\}, where P[Zi,j(t)=1]=pi,jP[Z^{(t)}_{i,j}=1]=p_{i,j}, pj,i=1pi,jp_{j,i}=1-p_{i,j}. The collection P=(pi,j)P=(p_{i,j}) is called the preference matrix, generally unknown and fixed but possibly subject to regularity constraints such as pi,j[3/8,5/8]p_{i,j} \in [3/8,5/8] for identification purposes (Jamieson et al., 2015).

Winner Criteria:

  • Borda Winner: The unique arm ii^* maximizing the Borda score

si=1K1jipi,j.s_i = \frac{1}{K-1}\sum_{j \neq i} p_{i,j}.

ii^* is the Borda winner if si>sis_{i^*} > s_i for all iii \neq i^*.

  • Copeland Winner: For each arm ii, define the Copeland score as the fraction of arms jj that ii beats (i.e., pi,j>1/2p_{i,j} > 1/2), normalized by K1K-1. The Copeland winner(s) maximize this score. Unlike the Condorcet winner, the Copeland winner always exists.
  • Condorcet Winner: An arm ii^* such that pi,j>1/2p_{i^*,j} > 1/2 for all jij \neq i^*. There may not always be a Condorcet winner, but a Borda or Copeland winner exists under mild assumptions (Zoghi et al., 2015).

The goal can be pure exploration (best-arm identification with high confidence and minimal samples) or regret minimization (minimize cumulative gap to optimality over time).

2. Sample Complexity and Lower Bounds

Borda Selection: Minimax Rates and Optimality

In the absence of special structure, achieving δ\delta-PAC selection of the Borda winner requires

NUB=O(ii1(sisi)2loglog[(sisi)2]δ)N_{\rm UB} = O\left( \sum_{i \neq i^*} \frac{1}{(s_{i^*} - s_i)^2} \log \frac{\log[(s_{i^*} - s_i)^{-2}]}{\delta} \right )

duels, where Δi=sisi\Delta_i = s_{i^*} - s_i (Jamieson et al., 2015). The lower bound matches this up to constant and lower-order logarithmic terms:

E[τ]Clog12δii1Δi2,C=1/90.\mathbb{E}[\tau] \geq C \log \frac{1}{2\delta} \sum_{i \neq i^*}\frac{1}{\Delta_i^2}, \quad C=1/90.

Thus, in the generic (non-structured) case, the minimax sample complexity is Θ(1/Δi2log(1/δ))\Theta(\sum 1/\Delta_i^2 \log(1/\delta)).

Impact of Sparsity

A notable advance is the exploitation of "sparsity" in the comparison matrix. Let gi(ω)=p1,ωpi,ωg_i(\omega) = p_{1,\omega} - p_{i,\omega} for each suboptimal arm ii (assuming arm 1 is the unique Borda winner). If, for each ii, gig_i is (γ,k)(\gamma,k)-approximately sparse—i.e., the sum of discriminative gaps outside the kk largest entries is at most γ\gamma times the main partial gap for the kk indices—then the leading term of 1/Δ21/\Delta^2 in sample complexity can be reduced by a factor of K/kK/k (Jamieson et al., 2015). That is, only O(k/Δi2)O(k/\Delta_i^2) duels are needed per arm.

3. Algorithmic Approaches: SECS and Successive Elimination

The SECS Algorithm

Successive Elimination with Comparison Sparsity (SECS) augments classical elimination with a sparse test. At each round, the algorithm either:

  1. Applies standard Borda elimination: if the empirical Borda gap for (i,j)(i,j) is large enough, eliminate jj.
  2. Applies a sparse partial-gap test: evaluate empirical pairwise gaps on the kk most discriminative comparison arms for (i,j)(i,j). If the partial gap exceeds a threshold, eliminate jj.
  • Input: sparsity kk, confidence δ\delta, (optionally) a time-gate T0T_0
  • Maintain an active set AA; at each time step:
    • For each jAj \in A, sample duels (j,It)(j, I_t) against a random arm ItI_t.
    • For jAj \in A, eliminate if there exists ii such that:
    • (a) t>T0t > T_0 and the partial empirical gap over the kk maximal discrepancy arms exceeds 6(k+1)Ct6(k+1)C_t.
    • (b) The full empirical gap in Borda estimates is large enough: s^i,t>s^j,t+KK12log(4Kt2/δ)/t\hat{s}_{i,t} > \hat{s}_{j,t} + \tfrac{K}{K-1} \sqrt{2 \log(4K t^2/\delta)/t}.

Theoretical Guarantees

Under the (γ,k)(\gamma,k) sparsity condition (for, e.g., γ=1/3\gamma = 1/3), SECS identifies the Borda winner with high probability using at most

O(j>1min{max{1R2logK/δR2,(k+1)2/KΔj2logK/δΔj2},1Δj2logK/δΔj2})O\left(\sum_{j>1} \min\Bigl\{ \max\left\{ \frac{1}{R^2}\log\frac{K/\delta}{R^2}, \frac{(k+1)^2/K}{\Delta_j^2}\log\frac{K/\delta}{\Delta_j^2} \right\}, \frac{1}{\Delta_j^2}\log\frac{K/\delta}{\Delta_j^2} \Bigr\}\right)

samples (Jamieson et al., 2015). For kKk \ll K and moderate gaps, this provides a K/kK/k improvement in sample complexity relative to the non-sparse baseline.

If kKk \to K, SECS reduces to standard successive elimination (full Borda reduction), matching known minimax rates (Jamieson et al., 2015).

4. Empirical Evaluation and Practical Impact

Synthetic and Real-World Data

  • Synthetic case (P1 matrix): For K=nK=n, with one top-2 distinguished group differing only on one other arm (k=1k=1 sparsity), SECS achieves Θ(nlogn)\Theta(n\log n) sample complexity, versus Θ(n2)\Theta(n^2) for standard Borda reduction. Log-log slopes are $2$ (standard) vs.\ $1$ (SECS).
  • Real data (MSLR-WEB10k, MQ2008): In web ranking settings (arms as rankers/features), SECS with small kk (e.g., k=5k=5) reduced the number of necessary duels by approximately 50%50\% compared to standard algorithms, with performance smoothly degrading to the full Borda cost as kKk \to K. In all cases with unique Borda winners, SECS reliably identified the correct arm.

Practical Guidance

  • When to use sparsity-based methods: If top candidates are only separated on a small number of discriminative arms/attributes, SECS yields substantial gains.
  • Parameter tuning: Small kk (e.g., k=5k=5) suffices for many real datasets; setting T0=0T_0=0 is often acceptable; the confidence δ\delta as in any PAC algorithm.
  • Otherwise: Use the standard Borda reduction—successive elimination based on Borda scores—if no structure is apparent.

Extensions to Other Winner Criteria and Problem Structures

  • Copeland and Condorcet settings: Algorithms such as Copeland Confidence Bound (CCB) and Scalable Copeland Bandits (SCB) generalize to broader winner criteria and retain O(KlogT)O(K\log T) regret without Condorcet assumptions (Zoghi et al., 2015).
  • Multi-dueling and Battling Bandits: Recent frameworks allow comparing larger subsets at each timestep and exploit richer feedback; however, fundamental sample complexity for winner-of-subset feedback is not improved unless richer ranking information is available (Saha et al., 2018). Under pure winner-information feedback, the lower bound matches the ordinary dueling-bandit setting.
  • Combinatorial Structures: In combinatorial pure exploration settings (e.g., best matching in a bipartite graph), Borda-type objectives admit reductions to CPE-MAB settings, and Condorcet objectives require min-max combinatorial strategies (Chen et al., 2020).

Algorithmic Summary Table

Method Structure Leveraged Sample Complexity / Regret
Borda Reduction None Θ(1/Δ2log(1/δ))\Theta(\sum 1/\Delta^2 \log(1/\delta))
SECS (Sparse) (γ,k)(\gamma, k)-sparsity O((k/K)1/Δ2log(1/δ))O((k/K) \sum 1/\Delta^2 \log(1/\delta))
CCB/SCB (Copeland) None/minimal O(KlogT)O(K\log T) for CCB/SCB under mild assumptions
PAC-Battling Bandits PL/Subset structure Winner info: O(n/ϵ2)O(n/\epsilon^2), Top-m: O(n/(mϵ2))O(n/(m\epsilon^2))

6. Theoretical and Practical Implications

Sparsity-aware algorithms such as SECS highlight that structural properties of the preference matrix can be exploited for substantial reductions in sample requirements, resulting in near-optimal practical and theoretical performance in large-scale systems with moderate discriminative structure. In the absence of such structure, classical reduction to multi-armed bandit techniques yield minimax bounds. The study of dueling-bandit selection serves as both a methodological foundation and a unifying paradigm for relative-feedback online decision-making in a wide range of applications.

7. References

These works collectively establish both tight theoretical rates and practical guidelines for sample-efficient selection in preference-based online learning frameworks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (4)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dueling-Bandit Selection.