Papers
Topics
Authors
Recent
Search
2000 character limit reached

ContrAstive Prompt Orchestration (CAPO)

Updated 8 February 2026
  • CAPO is a formal learning paradigm that employs contrastive objectives on diverse prompt variants for optimized downstream performance.
  • It utilizes discrete ranking and adaptive aggregation methods, integrating contrastive losses to enhance language model optimization and policy transfer.
  • Empirical results reveal significant gains in prompt quality, safety alignment, and few-shot learning across NLP and embodied AI tasks.

ContrAstive Prompt Orchestration (CAPO) is a formal learning paradigm and algorithmic family for leveraging prompt-level contrast—sometimes with dynamic orchestration mechanisms—to optimize downstream model behavior. Central to CAPO is the use of contrastive objectives over prompt variants, system prompts, or prompt-derived prototypes, and the explicit orchestration of these prompt forms (including dynamic aggregation). The approach is applied in diverse areas including LLM prompt optimization, few-shot learning, safety-aligned generation, unsupervised embedding construction, and visuomotor policy transfer. Key instantiations and theoretical formulations derive from large-scale empirical studies across NLP and embodied AI domains (Lee et al., 2 Sep 2025, Zhang et al., 1 Feb 2026, Zhao et al., 2024, Zeng et al., 2022, Jian et al., 2022).

1. Formal Definition and Framework Variants

CAPO is characterized by the following elements:

  • Contrast Source: Contrasting prompt structures, exemplars, or learned soft prompts with respect to quality, task relevance, or domain specificity.
  • Orchestration Mechanism: Either discrete selection (e.g., ranking and partitioning) or adaptive aggregation (via attention or optimization) over a set of learned prompts.
  • Contrastive Objective: Explicit use of contrastive loss functions (InfoNCE and its variants), supervised or unsupervised, to align model outputs with desired invariances or discriminative boundaries.
  • Application Domain: From discrete prompt optimization for LLMs to learnable prompt pools in vision-language policy learning.

A general CAPO mapping is:

CAPO:(q,P)p,\mathrm{CAPO}: (q, \mathcal{P}) \mapsto p^*,

where qq is a query, P\mathcal{P} a pool of candidate prompts (structured, learned, or retrieved), and pp^* the orchestrated prompt yielding optimal downstream performance under a designated metric.

Discrete retrieval-augmented CAPO (Lee et al., 2 Sep 2025):

  • For LLM prompt optimization, R(q)\mathcal{R}(q) retrieves kk scored prompts pip_i (e.g., from HelpSteer2), partitioned or ranked by metrics M={help,corr,coh,comp,verb}M = \{\text{help}, \text{corr}, \text{coh}, \text{comp}, \text{verb}\}.
  • Contrastive reasoning operators ΦCR\Phi_{\mathrm{CR}} and ΨCR\Psi_{\mathrm{CR}} generate reasoning-augmented instructions for qq0, leading to a synthesized optimized prompt.

Continuous and adaptive CAPO (Zhang et al., 1 Feb 2026):

  • For cross-embodiment visuomotor policy, a learnable pool qq1 of prompts is established, each disentangling a distinct domain factor (e.g., lighting, FOV).
  • Adaptive orchestration qq2 dynamically aggregates these via attention-weighted fusion, conditioning prompt mixing on current observations.

2. Retrieval-Augmented and Tiered Contrastive Prompt Optimization

In the automatic prompt optimization setting (Lee et al., 2 Sep 2025), CAPO proceeds via:

  1. Retrieval: Given a query qq3, qq4 retrieves qq5 prompts annotated for qq6. BM25 scoring for retrieval is used:

qq7

  1. Prompt Partitioning and Contrast Formation:
    • Prompts are sorted by average metric score qq8.
    • Disjoint sets are formed: qq9 (top), P\mathcal{P}0 (middle), P\mathcal{P}1 (bottom).
  2. Contrastive Reasoning Instruction:
    • Inputs to P\mathcal{P}2 use templates that reflect on strengths (from P\mathcal{P}3), weaknesses (from P\mathcal{P}4), and stable attributes (from P\mathcal{P}5).
  3. Objective:
    • Margin-based notional objectives guide contrastive distance between embeddings for high- and low-quality prompts, albeit with black-box P\mathcal{P}6.

Alternative "metric-wise" contrast isolates best-per-metric exemplars P\mathcal{P}7, instructing P\mathcal{P}8 to synthesize a composite prompt integrating strengths across all dimensions.

3. Hybrid Contrastive Prompt Pool Learning and Dynamic Orchestration

For cross-embodiment visuomotor adaptation (Zhang et al., 1 Feb 2026), CAPO incorporates:

  • Pool Construction: P\mathcal{P}9 learnable continuous prompts, each trained with a hybrid of:
    • Visual InfoNCE: enforcing invariance to lighting/appearance variations
    • Temporal action-based BYOL: aligning across embodiment/trajectory sequences
    • Text-to-vision alignment: semantic grounding via CLIP-based contrastive loss
  • Adaptive Orchestration Mechanism: Given an observation pp^*0, embeddings pp^*1 for each prompt pp^*2 are attention-weighted:

pp^*3

with pp^*4 (learnable MLP score) and pp^*5 (cosine similarity with unprompted pp^*6).

  • Fused Representation: The final feature is pp^*7, input to a policy optimized by PPO.

4. Contrastive Orchestration in Safety Alignment and Decoding

In safe LLM alignment, Adversarial Contrastive Decoding (ACD) (Zhao et al., 2024) instantiates a CAPO framework with:

  • Dual Opposite Prompt Optimization:
    • Learning two soft prompts—Safeguarding Prompt (SP) and Adversarial Prompt (AP)—via prompt-tuning on an anchor set distinguishing “refused” vs. “accepted” outputs in harmful/benign instruction cases.
    • Separate losses are applied to reinforce or discourage harmful completions.
  • Contrastive Decoding:
    • At inference, logits under SP and AP are combined by pp^*8, directly subtracting unsafe responses as evidenced by AP.
    • This orchestration consistently boosts harmlessness (HLR) across models by over 20 percentage points, while maintaining performance on regular tasks.
  • Comparison with Other Methods:
    • ACD requires no second model and auto-learns both prompt legs, outperforming non-learned or single-template contrastive approaches.

5. CAPO in Few-Shot and Unsupervised Representation Learning

Few-shot prompt-based learning (Jian et al., 2022):

  • CAPO generates multiple prompt+demonstation "views" per example, differing in template or context.
  • A supervised contrastive loss pp^*9 clusters same-class prompt views and repels cross-class ones, supplementing the masked-LM loss.
  • Results show +2–6 percentage points gains in accuracy/F1 over strong prompt-only and retrieval-augmented baselines across 15 tasks.

Unsupervised sentence embedding (Zeng et al., 2022):

  • ConPVP constructs prompt-derived virtual semantic prototypes, with each instance paired to both positive and negative prompt-based sequences.
  • A prototypical InfoNCE loss pulls anchor sentence embedding R(q)\mathcal{R}(q)0 towards its positive prototype R(q)\mathcal{R}(q)1 and away from its negative prototype R(q)\mathcal{R}(q)2 plus all other batch prototypes.
  • Empirical results show consistent improvements in STS tasks (e.g., +2.6 Spearman’s R(q)\mathcal{R}(q)3 over SimCSE) and text clustering accuracy.

6. Experimental Results and Empirical Findings

Model Help Corr Coh Comp Verb Avg
GPT-4o Direct 0.366 0.435 0.767 0.405 0.664 0.527
CAPO-Tiered 0.525 0.607 0.882 0.447 0.717 0.636
CAPO-Metric 0.516 0.596 0.876 0.432 0.678 0.620
  • Ablations indicate that omitting contrastive reasoning degrades performance by 8–12%.
  • k=10 retrieval is optimal; larger R(q)\mathcal{R}(q)4 leads to noise dilution.
Approach SR↑ SPL↑ NE↓ EL↓
CURL 52.0±1.9 0.32±0.07 0.48±0.08 32±6
CAPO 97.9±1.2 0.66±0.04 0.02±0.01 18±3
  • Ablations confirm the complementary necessity of visual, temporal-action, and text contrastive objectives.
  • CAPO exhibits superior zero-shot generalization across domains and embodiment changes.
  • HLR (Harmless Rate) is improved from 71.4% (Base) to 92.4% (ACD CAPO), with negligible cost to general win/truthful rates and halved jailbreak attack success rates.

7. Limitations, Open Questions, and Future Directions

Limitations across CAPO studies include:

  • Dependence on annotated prompt corpora (e.g., HelpSteer2) or domain-aligned anchor sets (Lee et al., 2 Sep 2025, Zhao et al., 2024).
  • Generalization to multi-turn dialogue and unannotated domains is largely unexplored.
  • In adaptive orchestration settings, prompt pool size and length trade-offs exist (poor performance with excessive redundancy or overfitting) (Zhang et al., 1 Feb 2026).
  • Retrieval quality (BM25 vs. neural) and model-agnostic orchestration merit further research.

Suggested future work encompasses:

  • Dynamic, user-driven metric weighting in orchestration.
  • Integration of human-in-the-loop for iterative prompt refinement.
  • Orchestration over chain-of-thought or multi-step reasoning traces.
  • Extension of CAPO-inspired approaches to continuous prompt spaces across broader multimodal and multilingual domains.

References

  • "Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization" (Lee et al., 2 Sep 2025)
  • "Learning Adaptive Cross-Embodiment Visuomotor Policy with Contrastive Prompt Orchestration" (Zhang et al., 1 Feb 2026)
  • "Adversarial Contrastive Decoding: Boosting Safety Alignment of LLMs via Opposite Prompt Optimization" (Zhao et al., 2024)
  • "Contrastive Learning with Prompt-derived Virtual Semantic Prototypes for Unsupervised Sentence Embedding" (Zeng et al., 2022)
  • "Contrastive Learning for Prompt-Based Few-Shot Language Learners" (Jian et al., 2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ContrAstive Prompt Orchestration (CAPO).