Dueling-Bandit Selection: Sparse and Efficient Methods

Updated 13 February 2026

Dueling-bandit selection is the process of identifying optimal items through noisy pairwise duels, emphasizing relative feedback over direct rewards.
Algorithms employ winner criteria such as Borda, Copeland, and Condorcet, with methods like SECS exploiting sparsity to efficiently eliminate suboptimal choices.
Theoretical guarantees and empirical evaluations demonstrate reduced sample complexity and robust performance in both synthetic simulations and real-world ranking tasks.

Dueling-bandit selection is the problem of sequentially identifying optimal or near-optimal items from a finite set based solely on noisy pairwise comparisons, rather than absolute reward signals. This setting is fundamental in interactive learning, online ranking, and preference elicitation, where only relative (comparative) feedback is available. The field covers sample-optimal pure-exploration and regret-minimization algorithms under a variety of winner criteria, including the Borda, Copeland, and Condorcet notions, with specialized methodologies leveraging structure such as sparsity or transitivity in the underlying preference matrix.

1. Problem Formulation and Winner Criteria

Consider $K$ arms, with each action a pairwise "duel" $(i, j)$ between distinct arms, yielding a random outcome $Z^{(t)}_{i,j} \in \{0,1\}$ , where $P[Z^{(t)}_{i,j}=1]=p_{i,j}$ , $p_{j,i}=1-p_{i,j}$ . The collection $P=(p_{i,j})$ is called the preference matrix, generally unknown and fixed but possibly subject to regularity constraints such as $p_{i,j} \in [3/8,5/8]$ for identification purposes (Jamieson et al., 2015).

Winner Criteria:

Borda Winner: The unique arm $i^*$ maximizing the Borda score

$s_i = \frac{1}{K-1}\sum_{j \neq i} p_{i,j}.$

$i^*$ is the Borda winner if $s_{i^*} > s_i$ for all $i \neq i^*$ .

Copeland Winner: For each arm $i$ , define the Copeland score as the fraction of arms $j$ that $i$ beats (i.e., $p_{i,j} > 1/2$ ), normalized by $K-1$ . The Copeland winner(s) maximize this score. Unlike the Condorcet winner, the Copeland winner always exists.
Condorcet Winner: An arm $i^*$ such that $p_{i^*,j} > 1/2$ for all $j \neq i^*$ . There may not always be a Condorcet winner, but a Borda or Copeland winner exists under mild assumptions (Zoghi et al., 2015).

The goal can be pure exploration (best-arm identification with high confidence and minimal samples) or regret minimization (minimize cumulative gap to optimality over time).

2. Sample Complexity and Lower Bounds

Borda Selection: Minimax Rates and Optimality

In the absence of special structure, achieving $\delta$ -PAC selection of the Borda winner requires

$N_{\rm UB} = O\left( \sum_{i \neq i^*} \frac{1}{(s_{i^*} - s_i)^2} \log \frac{\log[(s_{i^*} - s_i)^{-2}]}{\delta} \right )$

duels, where $\Delta_i = s_{i^*} - s_i$ (Jamieson et al., 2015). The lower bound matches this up to constant and lower-order logarithmic terms:

$\mathbb{E}[\tau] \geq C \log \frac{1}{2\delta} \sum_{i \neq i^*}\frac{1}{\Delta_i^2}, \quad C=1/90.$

Thus, in the generic (non-structured) case, the minimax sample complexity is $\Theta(\sum 1/\Delta_i^2 \log(1/\delta))$ .

Impact of Sparsity

A notable advance is the exploitation of "sparsity" in the comparison matrix. Let $g_i(\omega) = p_{1,\omega} - p_{i,\omega}$ for each suboptimal arm $i$ (assuming arm 1 is the unique Borda winner). If, for each $i$ , $g_i$ is $(\gamma,k)$ -approximately sparse—i.e., the sum of discriminative gaps outside the $k$ largest entries is at most $\gamma$ times the main partial gap for the $k$ indices—then the leading term of $1/\Delta^2$ in sample complexity can be reduced by a factor of $K/k$ (Jamieson et al., 2015). That is, only $O(k/\Delta_i^2)$ duels are needed per arm.

3. Algorithmic Approaches: SECS and Successive Elimination

The SECS Algorithm

Successive Elimination with Comparison Sparsity (SECS) augments classical elimination with a sparse test. At each round, the algorithm either:

Applies standard Borda elimination: if the empirical Borda gap for $(i,j)$ is large enough, eliminate $j$ .
Applies a sparse partial-gap test: evaluate empirical pairwise gaps on the $k$ most discriminative comparison arms for $(i,j)$ . If the partial gap exceeds a threshold, eliminate $j$ .

Input: sparsity $k$ , confidence $\delta$ , (optionally) a time-gate $T_0$
Maintain an active set $A$ $A$ ; at each time step:
- For each $j \in A$ , sample duels $(j, I_t)$ against a random arm $I_t$ .
- For $j \in A$ , eliminate if there exists $i$ such that:
- (a) $t > T_0$ and the partial empirical gap over the $k$ maximal discrepancy arms exceeds $6(k+1)C_t$ .
- (b) The full empirical gap in Borda estimates is large enough: $\hat{s}_{i,t} > \hat{s}_{j,t} + \tfrac{K}{K-1} \sqrt{2 \log(4K t^2/\delta)/t}$ .

Theoretical Guarantees

Under the $(\gamma,k)$ sparsity condition (for, e.g., $\gamma = 1/3$ ), SECS identifies the Borda winner with high probability using at most

$O\left(\sum_{j>1} \min\Bigl\{ \max\left\{ \frac{1}{R^2}\log\frac{K/\delta}{R^2}, \frac{(k+1)^2/K}{\Delta_j^2}\log\frac{K/\delta}{\Delta_j^2} \right\}, \frac{1}{\Delta_j^2}\log\frac{K/\delta}{\Delta_j^2} \Bigr\}\right)$

samples (Jamieson et al., 2015). For $k \ll K$ and moderate gaps, this provides a $K/k$ improvement in sample complexity relative to the non-sparse baseline.

If $k \to K$ , SECS reduces to standard successive elimination (full Borda reduction), matching known minimax rates (Jamieson et al., 2015).

4. Empirical Evaluation and Practical Impact

Synthetic and Real-World Data

Synthetic case (P1 matrix): For $K=n$ , with one top-2 distinguished group differing only on one other arm ( $k=1$ sparsity), SECS achieves $\Theta(n\log n)$ sample complexity, versus $\Theta(n^2)$ for standard Borda reduction. Log-log slopes are $2$ (standard) vs.\ $1$ (SECS).
Real data (MSLR-WEB10k, MQ2008): In web ranking settings (arms as rankers/features), SECS with small $k$ (e.g., $k=5$ ) reduced the number of necessary duels by approximately $50\%$ compared to standard algorithms, with performance smoothly degrading to the full Borda cost as $k \to K$ . In all cases with unique Borda winners, SECS reliably identified the correct arm.

Practical Guidance

When to use sparsity-based methods: If top candidates are only separated on a small number of discriminative arms/attributes, SECS yields substantial gains.
Parameter tuning: Small $k$ (e.g., $k=5$ ) suffices for many real datasets; setting $T_0=0$ is often acceptable; the confidence $\delta$ as in any PAC algorithm.
Otherwise: Use the standard Borda reduction—successive elimination based on Borda scores—if no structure is apparent.

Extensions to Other Winner Criteria and Problem Structures

Copeland and Condorcet settings: Algorithms such as Copeland Confidence Bound (CCB) and Scalable Copeland Bandits (SCB) generalize to broader winner criteria and retain $O(K\log T)$ regret without Condorcet assumptions (Zoghi et al., 2015).
Multi-dueling and Battling Bandits: Recent frameworks allow comparing larger subsets at each timestep and exploit richer feedback; however, fundamental sample complexity for winner-of-subset feedback is not improved unless richer ranking information is available (Saha et al., 2018). Under pure winner-information feedback, the lower bound matches the ordinary dueling-bandit setting.
Combinatorial Structures: In combinatorial pure exploration settings (e.g., best matching in a bipartite graph), Borda-type objectives admit reductions to CPE-MAB settings, and Condorcet objectives require min-max combinatorial strategies (Chen et al., 2020).

Algorithmic Summary Table

Method	Structure Leveraged	Sample Complexity / Regret
Borda Reduction	None	$\Theta(\sum 1/\Delta^2 \log(1/\delta))$
SECS (Sparse)	$(\gamma, k)$ -sparsity	$O((k/K) \sum 1/\Delta^2 \log(1/\delta))$
CCB/SCB (Copeland)	None/minimal	$O(K\log T)$ for CCB/SCB under mild assumptions
PAC-Battling Bandits	PL/Subset structure	Winner info: $O(n/\epsilon^2)$ , Top-m: $O(n/(m\epsilon^2))$

6. Theoretical and Practical Implications

Sparsity-aware algorithms such as SECS highlight that structural properties of the preference matrix can be exploited for substantial reductions in sample requirements, resulting in near-optimal practical and theoretical performance in large-scale systems with moderate discriminative structure. In the absence of such structure, classical reduction to multi-armed bandit techniques yield minimax bounds. The study of dueling-bandit selection serves as both a methodological foundation and a unifying paradigm for relative-feedback online decision-making in a wide range of applications.

7. References

"Sparse Dueling Bandits" (Jamieson et al., 2015)
"Copeland Dueling Bandits" (Zoghi et al., 2015)
"PAC Battling Bandits in the Plackett-Luce Model" (Saha et al., 2018)
"Combinatorial Pure Exploration of Dueling Bandit" (Chen et al., 2020)

These works collectively establish both tight theoretical rates and practical guidelines for sample-efficient selection in preference-based online learning frameworks.

Markdown Report Issue Upgrade to Chat

References (4)

Sparse Dueling Bandits (2015)

Copeland Dueling Bandits (2015)

PAC Battling Bandits in the Plackett-Luce Model (2018)

Combinatorial Pure Exploration of Dueling Bandit (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dueling-Bandit Selection.

Dueling-Bandit Selection: Sparse and Efficient Methods

1. Problem Formulation and Winner Criteria

2. Sample Complexity and Lower Bounds

Borda Selection: Minimax Rates and Optimality

Impact of Sparsity

3. Algorithmic Approaches: SECS and Successive Elimination

The SECS Algorithm

Pseudocode Outline [algorithmic core from (Jamieson et al., 2015)]:

Theoretical Guarantees

4. Empirical Evaluation and Practical Impact

Synthetic and Real-World Data

Practical Guidance

Extensions to Other Winner Criteria and Problem Structures

Algorithmic Summary Table

6. Theoretical and Practical Implications

7. References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Dueling-Bandit Selection: Sparse and Efficient Methods

1. Problem Formulation and Winner Criteria

2. Sample Complexity and Lower Bounds

Borda Selection: Minimax Rates and Optimality

Impact of Sparsity

3. Algorithmic Approaches: SECS and Successive Elimination

The SECS Algorithm

Pseudocode Outline [algorithmic core from (Jamieson et al., 2015)]:

Theoretical Guarantees

4. Empirical Evaluation and Practical Impact

Synthetic and Real-World Data

Practical Guidance

5. Broader Connections and Related Developments

Extensions to Other Winner Criteria and Problem Structures

Algorithmic Summary Table

6. Theoretical and Practical Implications

7. References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics