ContrAstive Prompt Orchestration (CAPO)

Updated 8 February 2026

CAPO is a formal learning paradigm that employs contrastive objectives on diverse prompt variants for optimized downstream performance.
It utilizes discrete ranking and adaptive aggregation methods, integrating contrastive losses to enhance language model optimization and policy transfer.
Empirical results reveal significant gains in prompt quality, safety alignment, and few-shot learning across NLP and embodied AI tasks.

ContrAstive Prompt Orchestration (CAPO) is a formal learning paradigm and algorithmic family for leveraging prompt-level contrast—sometimes with dynamic orchestration mechanisms—to optimize downstream model behavior. Central to CAPO is the use of contrastive objectives over prompt variants, system prompts, or prompt-derived prototypes, and the explicit orchestration of these prompt forms (including dynamic aggregation). The approach is applied in diverse areas including LLM prompt optimization, few-shot learning, safety-aligned generation, unsupervised embedding construction, and visuomotor policy transfer. Key instantiations and theoretical formulations derive from large-scale empirical studies across NLP and embodied AI domains (Lee et al., 2 Sep 2025, Zhang et al., 1 Feb 2026, Zhao et al., 2024, Zeng et al., 2022, Jian et al., 2022).

1. Formal Definition and Framework Variants

CAPO is characterized by the following elements:

Contrast Source: Contrasting prompt structures, exemplars, or learned soft prompts with respect to quality, task relevance, or domain specificity.
Orchestration Mechanism: Either discrete selection (e.g., ranking and partitioning) or adaptive aggregation (via attention or optimization) over a set of learned prompts.
Contrastive Objective: Explicit use of contrastive loss functions (InfoNCE and its variants), supervised or unsupervised, to align model outputs with desired invariances or discriminative boundaries.
Application Domain: From discrete prompt optimization for LLMs to learnable prompt pools in vision-language policy learning.

A general CAPO mapping is:

$\mathrm{CAPO}: (q, \mathcal{P}) \mapsto p^*,$

where $q$ is a query, $\mathcal{P}$ a pool of candidate prompts (structured, learned, or retrieved), and $p^*$ the orchestrated prompt yielding optimal downstream performance under a designated metric.

Discrete retrieval-augmented CAPO (Lee et al., 2 Sep 2025):

For LLM prompt optimization, $\mathcal{R}(q)$ retrieves $k$ scored prompts $p_i$ (e.g., from HelpSteer2), partitioned or ranked by metrics $M = \{\text{help}, \text{corr}, \text{coh}, \text{comp}, \text{verb}\}$ .
Contrastive reasoning operators $\Phi_{\mathrm{CR}}$ and $\Psi_{\mathrm{CR}}$ generate reasoning-augmented instructions for $q$ 0, leading to a synthesized optimized prompt.

Continuous and adaptive CAPO (Zhang et al., 1 Feb 2026):

For cross-embodiment visuomotor policy, a learnable pool $q$ 1 of prompts is established, each disentangling a distinct domain factor (e.g., lighting, FOV).
Adaptive orchestration $q$ 2 dynamically aggregates these via attention-weighted fusion, conditioning prompt mixing on current observations.

2. Retrieval-Augmented and Tiered Contrastive Prompt Optimization

In the automatic prompt optimization setting (Lee et al., 2 Sep 2025), CAPO proceeds via:

Retrieval: Given a query $q$ 3, $q$ 4 retrieves $q$ 5 prompts annotated for $q$ 6. BM25 scoring for retrieval is used:

$q$ 7

Prompt Partitioning and Contrast Formation:
- Prompts are sorted by average metric score $q$ 8.
- Disjoint sets are formed: $q$ 9 (top), $\mathcal{P}$ 0 (middle), $\mathcal{P}$ 1 (bottom).
Contrastive Reasoning Instruction:
- Inputs to $\mathcal{P}$ 2 use templates that reflect on strengths (from $\mathcal{P}$ 3), weaknesses (from $\mathcal{P}$ 4), and stable attributes (from $\mathcal{P}$ 5).
Objective:
- Margin-based notional objectives guide contrastive distance between embeddings for high- and low-quality prompts, albeit with black-box $\mathcal{P}$ 6.

Alternative "metric-wise" contrast isolates best-per-metric exemplars $\mathcal{P}$ 7, instructing $\mathcal{P}$ 8 to synthesize a composite prompt integrating strengths across all dimensions.

3. Hybrid Contrastive Prompt Pool Learning and Dynamic Orchestration

For cross-embodiment visuomotor adaptation (Zhang et al., 1 Feb 2026), CAPO incorporates:

Pool Construction: $\mathcal{P}$ $P$ 9 learnable continuous prompts, each trained with a hybrid of:
- Visual InfoNCE: enforcing invariance to lighting/appearance variations
- Temporal action-based BYOL: aligning across embodiment/trajectory sequences
- Text-to-vision alignment: semantic grounding via CLIP-based contrastive loss
Adaptive Orchestration Mechanism: Given an observation $p^*$ 0, embeddings $p^*$ 1 for each prompt $p^*$ 2 are attention-weighted:

$p^*$ 3

with $p^*$ 4 (learnable MLP score) and $p^*$ 5 (cosine similarity with unprompted $p^*$ 6).

Fused Representation: The final feature is $p^*$ 7, input to a policy optimized by PPO.

4. Contrastive Orchestration in Safety Alignment and Decoding

In safe LLM alignment, Adversarial Contrastive Decoding (ACD) (Zhao et al., 2024) instantiates a CAPO framework with:

Dual Opposite Prompt Optimization:
- Learning two soft prompts—Safeguarding Prompt (SP) and Adversarial Prompt (AP)—via prompt-tuning on an anchor set distinguishing “refused” vs. “accepted” outputs in harmful/benign instruction cases.
- Separate losses are applied to reinforce or discourage harmful completions.
Contrastive Decoding:
- At inference, logits under SP and AP are combined by $p^*$ 8, directly subtracting unsafe responses as evidenced by AP.
- This orchestration consistently boosts harmlessness (HLR) across models by over 20 percentage points, while maintaining performance on regular tasks.
Comparison with Other Methods:
- ACD requires no second model and auto-learns both prompt legs, outperforming non-learned or single-template contrastive approaches.

5. CAPO in Few-Shot and Unsupervised Representation Learning

Few-shot prompt-based learning (Jian et al., 2022):

CAPO generates multiple prompt+demonstation "views" per example, differing in template or context.
A supervised contrastive loss $p^*$ 9 clusters same-class prompt views and repels cross-class ones, supplementing the masked-LM loss.
Results show +2–6 percentage points gains in accuracy/F1 over strong prompt-only and retrieval-augmented baselines across 15 tasks.

Unsupervised sentence embedding (Zeng et al., 2022):

ConPVP constructs prompt-derived virtual semantic prototypes, with each instance paired to both positive and negative prompt-based sequences.
A prototypical InfoNCE loss pulls anchor sentence embedding $\mathcal{R}(q)$ 0 towards its positive prototype $\mathcal{R}(q)$ 1 and away from its negative prototype $\mathcal{R}(q)$ 2 plus all other batch prototypes.
Empirical results show consistent improvements in STS tasks (e.g., +2.6 Spearman’s $\mathcal{R}(q)$ 3 over SimCSE) and text clustering accuracy.

6. Experimental Results and Empirical Findings

Model	Help	Corr	Coh	Comp	Verb	Avg
GPT-4o Direct	0.366	0.435	0.767	0.405	0.664	0.527
CAPO-Tiered	0.525	0.607	0.882	0.447	0.717	0.636
CAPO-Metric	0.516	0.596	0.876	0.432	0.678	0.620

Ablations indicate that omitting contrastive reasoning degrades performance by 8–12%.
k=10 retrieval is optimal; larger $\mathcal{R}(q)$ 4 leads to noise dilution.

Approach	SR↑	SPL↑	NE↓	EL↓
CURL	52.0±1.9	0.32±0.07	0.48±0.08	32±6
CAPO	97.9±1.2	0.66±0.04	0.02±0.01	18±3

Ablations confirm the complementary necessity of visual, temporal-action, and text contrastive objectives.
CAPO exhibits superior zero-shot generalization across domains and embodiment changes.

HLR (Harmless Rate) is improved from 71.4% (Base) to 92.4% (ACD CAPO), with negligible cost to general win/truthful rates and halved jailbreak attack success rates.

7. Limitations, Open Questions, and Future Directions

Limitations across CAPO studies include:

Dependence on annotated prompt corpora (e.g., HelpSteer2) or domain-aligned anchor sets (Lee et al., 2 Sep 2025, Zhao et al., 2024).
Generalization to multi-turn dialogue and unannotated domains is largely unexplored.
In adaptive orchestration settings, prompt pool size and length trade-offs exist (poor performance with excessive redundancy or overfitting) (Zhang et al., 1 Feb 2026).
Retrieval quality (BM25 vs. neural) and model-agnostic orchestration merit further research.

Suggested future work encompasses:

Dynamic, user-driven metric weighting in orchestration.
Integration of human-in-the-loop for iterative prompt refinement.
Orchestration over chain-of-thought or multi-step reasoning traces.
Extension of CAPO-inspired approaches to continuous prompt spaces across broader multimodal and multilingual domains.

References

"Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization" (Lee et al., 2 Sep 2025)
"Learning Adaptive Cross-Embodiment Visuomotor Policy with Contrastive Prompt Orchestration" (Zhang et al., 1 Feb 2026)
"Adversarial Contrastive Decoding: Boosting Safety Alignment of LLMs via Opposite Prompt Optimization" (Zhao et al., 2024)
"Contrastive Learning with Prompt-derived Virtual Semantic Prototypes for Unsupervised Sentence Embedding" (Zeng et al., 2022)
"Contrastive Learning for Prompt-Based Few-Shot Language Learners" (Jian et al., 2022)

Markdown Report Issue Upgrade to Chat

References (5)

Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization (2025)

Learning Adaptive Cross-Embodiment Visuomotor Policy with Contrastive Prompt Orchestration (2026)

Adversarial Contrastive Decoding: Boosting Safety Alignment of Large Language Models via Opposite Prompt Optimization (2024)

Contrastive Learning with Prompt-derived Virtual Semantic Prototypes for Unsupervised Sentence Embedding (2022)

Contrastive Learning for Prompt-Based Few-Shot Language Learners (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ContrAstive Prompt Orchestration (CAPO).

ContrAstive Prompt Orchestration (CAPO)

1. Formal Definition and Framework Variants

2. Retrieval-Augmented and Tiered Contrastive Prompt Optimization

3. Hybrid Contrastive Prompt Pool Learning and Dynamic Orchestration

4. Contrastive Orchestration in Safety Alignment and Decoding

5. CAPO in Few-Shot and Unsupervised Representation Learning

6. Experimental Results and Empirical Findings

Prompt Optimization (LLMs, (Lee et al., 2 Sep 2025))

Visuomotor Policy Transfer (Zhang et al., 1 Feb 2026)

Safety Decoding (Zhao et al., 2024)

7. Limitations, Open Questions, and Future Directions

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

ContrAstive Prompt Orchestration (CAPO)

1. Formal Definition and Framework Variants

2. Retrieval-Augmented and Tiered Contrastive Prompt Optimization

3. Hybrid Contrastive Prompt Pool Learning and Dynamic Orchestration

4. Contrastive Orchestration in Safety Alignment and Decoding

5. CAPO in Few-Shot and Unsupervised Representation Learning

6. Experimental Results and Empirical Findings

Prompt Optimization (LLMs, (Lee et al., 2 Sep 2025))

Visuomotor Policy Transfer (Zhang et al., 1 Feb 2026)

Safety Decoding (Zhao et al., 2024)

7. Limitations, Open Questions, and Future Directions

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics