Modular Prompt Optimization (MPO)

Updated 14 January 2026

MPO is a structured approach that decomposes prompt engineering into semantically distinct modules for clearer, adaptable design.
It achieves state-of-the-art efficiency and robust few-shot generalization by optimizing prompt sections independently.
MPO enhances interpretability and scalability in both text-based LLMs and vision-language pipelines through modular, cost-aware strategies.

Modular Prompt Optimization (MPO) is a structured, compositional approach to prompt engineering and adaptation that decomposes prompt construction, optimization, and/or representation into coordinated modules, each with explicit technical interfaces. MPO frameworks extend prompt engineering beyond monolithic rewrite or fine-tuning strategies by enabling independent optimization, selection, or learning of distinct prompt components—such as semantic sections, composable sub-prompts, or programmatic segments—within LLMs, vision-LLMs, and modular LM pipelines. By leveraging this modular abstraction, MPO methods achieve state-of-the-art parameter efficiency, robust few-shot generalization, reduced computational overhead, and enhanced interpretability and extensibility across text, vision, and programmatic LLM settings.

1. Modular Prompt Decomposition and Representations

MPO frameworks partition prompts into explicit modules that reflect semantic, structural, or functional divisions appropriate to the model and task regime:

Schema-based sections: Prompts are decomposed into fixed semantic components—such as System Role, Context, Task Details, Constraints, Output Format—each optimized independently (Sharma et al., 7 Jan 2026).
Composable prompt embeddings: The target prompt for each task $P_t$ is defined as a combination of shared source prompts $\{P_s\}$ and a task-specific private prompt $P_t^{priv}$ ,

$P_t = P_t^{priv} \circ \Bigl( \bigoplus_{s=1}^M w_{t,s} P_s \Bigr)$

where $\bigoplus$ and $\circ$ specify combination operators such as summation, concatenation, addition, or elementwise multiplication (Pouramini et al., 2024).

Layer-wise modular soft prompts: For Transformer-based PLMs, deep modular prompt tuning injects multiple trainable prompt vectors at each layer, with layer-wise routers learning how to activate and compose them for each task:

$\mathbf{p}^{(l)} = \frac{1}{K} \sum_{k=1}^K \hat{w}_k^{(l)} \mathbf{p}_k^{(l)}, \quad \hat{w}_k^{(l)} \in \{0,1\}$

The routers are optimized to select or blend prompts for strong compositional generalization (Sun et al., 2022).

Symbolic and graph-based prompt programs: Prompt structures are represented as directed acyclic graphs of composable nodes (e.g., SectionNode, DataNode, Concat), facilitating rich structural search, constraint enforcement, and symbolic mutations at compile time (Schnabel et al., 2024).

This decomposition enables flexible adaptation, combinatorial reuse, and fine-grained control over prompt subcomponents.

2. Optimization Algorithms and Cost-Aware Objectives

Several classes of optimization algorithms underpin MPO frameworks, reflecting differing access to model gradients, types of prompt modules, and overall program structure:

Section-local textual gradients: Each semantic prompt section $s^{(k)}$ is refined using black-box “textual gradients”:

$\Delta s^{(k)}_t = \mathcal{C}(s^{(k)}_t;\;P_t \setminus s^{(k)}_t)$

where $\mathcal{C}$ is an LLM-based critic, iteratively updating only section $\{P_s\}$ 0 to avoid interference and prompt bloat (Sharma et al., 7 Jan 2026).

Meta-prompt and demonstration-based search: High-level meta-prompts rephrase entire task specifications, while demonstration-based selectors choose optimal module configurations (e.g., DSPy modules for Chain-of-Thought, ReAct, etc.) via probabilistic performance objectives (Murthy et al., 17 Jul 2025).
Gradient-based and black-box prompt tuning: Modular schemes support standard Adam gradient descent, Bayesian optimization, and black-box evolutionary strategies (e.g., CMA-ES, EvoPrompt) to optimize module selection/combination parameters and prompt embeddings, often using task-specific rewards or validation set (Sun et al., 2022, Zehle et al., 2 Dec 2025).
Cost-aware scalarized objectives: Optimization targets a composite loss,

$\{P_s\}$ 1

balancing accuracy, F1/BertScore/EM, and prompt length or complexity; additional terms may capture prompt complexity (unique/total tokens) (Murthy et al., 17 Jul 2025, Schnabel et al., 2024).

Alternated prompt–weight optimization (BetterTogether): In multi-stage or modular LM systems, prompt parameters $\{P_s\}$ 2 and module weights $\{P_s\}$ 3 are alternately optimized, each guided by bootstrapped traces from the other’s current solution, yielding superior end-to-end performance compared to optimizing either in isolation (Soylu et al., 2024, Ziems et al., 6 Aug 2025).

This suite of strategies supports discrete, continuous, and hybrid modular optimization, both in frozen and trainable-weights architectures.

3. MPO in Language, Vision, and Programmatic Pipelines

MPO designs have been instantiated in a wide array of neural pipelines and settings:

Text-based LLMs: Modular prompt composition, section-local gradients, meta-prompt rewriting, and DSPy-based compilers enable automatic LLM adaptation to classification, QA, summarization, reasoning, or math, supporting both supervised and few-shot regimes (Murthy et al., 17 Jul 2025, Sharma et al., 7 Jan 2026, Pouramini et al., 2024).
Vision-LLMs (VLMs): Modular Prompt Learning (MPL) for base CLIP-style models preserves information by carrying over selected prompt vectors across all visual transformer layers, in contrast to naive “deep” visual prompt tuning, which replaces prompts layerwise and loses context (Huang et al., 19 Feb 2025).
Retrieval and multi-task settings: Modular prompt tuning assigns modules to retrieval task attributes (e.g., Fact-checking, Science), enabling arithmetic on modules (addition, subtraction, scaling) for new tasks and strong zero-shot generalization (Liang et al., 2023).
Program-like LM pipelines: Symbolic and programmatic prompt optimization treats complex LLM workflows as programs with explicit modules/sections, applying structural transformations (e.g., drop, paraphrase, rearrangement) and enforcing compile-time constraints (Schnabel et al., 2024).
Multi-module RL fine-tuning and hybrid updates: In multi-step LM pipelines, per-module GRPO-style policy gradients are aligned to each module’s prompt and trajectory, while prompt optimization proceeds jointly or in staged alternation, with statistically significant accuracy gains (Ziems et al., 6 Aug 2025, Soylu et al., 2024).

These applications demonstrate the broad utility and technical flexibility of the modular prompt paradigm.

4. Empirical Benchmarks and Comparative Performance

MPO methods consistently establish or match state-of-the-art results on a variety of challenging datasets and task suites:

Setting	Metric	Baseline	MPO Variant	Score/Improvement	Reference
SQuAD_2 (QA)	BertScore	AdalFlow: 0.922	Promptomatix: 0.913	within 0.01 of best	(Murthy et al., 17 Jul 2025)
GSM8K (Math)	EM	AdalFlow: 0.767	Promptomatix: 0.732	within 0.04 of best	(Murthy et al., 17 Jul 2025)
GLUE (few-shot, mean)	Accuracy %	PT: 63.35	ComPT-MSUM: 76.03 (+12.7 points)	80–81% at scale	(Pouramini et al., 2024)
ARC-Challenge	Accuracy %	Untuned: 75.0	MPO (sectional): 79.1	+4.1 pts vs. TextGrad	(Sharma et al., 7 Jan 2026)
MMLU	Accuracy %	Untuned: 57.2	MPO (sectional): 61.5	+5.1 pts vs. TextGrad	(Sharma et al., 7 Jan 2026)
Vision-LM (EuroSAT)	Accuracy %	PromptSRC: 72.05	MPL: 79.73 (+7.68)	Largest: +10.7% on EuroSAT	(Huang et al., 19 Feb 2025)
Modular program (avg)	Accuracy %	Vanilla CoT: 66.3	BetterTogether: 73.4 (+7.1)	Up to 11% average accuracy boost	(Ziems et al., 6 Aug 2025)

Notable trends include:

Modular few-shot prompt optimization (ComPT/MSUM/SSUM) yields +11–13pt gains over single-task prompt tuning or full model tuning with minimal data (Pouramini et al., 2024).
Carry-forward prompt memory (VLMs) and section-local refinements (LLMs) consistently outperform their non-modular, monolithic baselines, notably improving in low-data and zero-shot generalization (Huang et al., 19 Feb 2025, Sharma et al., 7 Jan 2026).
Modular prompt tuning and module-wise RL jointly (BetterTogether) unlock accuracy gains not accessible to prompt- or weight-tuning alone, especially for multi-stage, pipeline LLM programs (Soylu et al., 2024, Ziems et al., 6 Aug 2025).
Modularization also yields prompt length reductions of up to 43%, significant computational efficiency (+90% reduction in LLM calls in quick search mode), and enhanced interpretability versus non-modular pipelines (Murthy et al., 17 Jul 2025).

5. Extensibility, Interpretability, and Practical Guidance

MPO’s modular abstractions have several practical technical advantages:

Extensibility: All major frameworks (Promptomatix, promptolution, ComPT, SAMMO) expose clear module interfaces (e.g., prompt section, prompt embedding, program node), enabling plug-and-play composition of new optimization backends, data generation routines, or evaluators without system overhaul (Murthy et al., 17 Jul 2025, Zehle et al., 2 Dec 2025, Pouramini et al., 2024, Schnabel et al., 2024).
Interpretability: Attention weights or ablation studies on modules reveal which shared/private or attribute prompts drive task gains. Section-local updates link outcome changes to specific functional prompt parts (Sharma et al., 7 Jan 2026, Pouramini et al., 2024, Liang et al., 2023).
Efficiency and scaling: Modular optimization allows for rapid adaptation, low-parameter transfer, and controlled prompt growth. For example, router-only adaptation in MP² requires learning as few as 8 parameters per task (Sun et al., 2022).
Guidelines: Optimal module numbers and combination strategies should reflect task diversity and semantic overlap; attention/logit learning rate should be higher than for private modules; in few-shot settings, constant weights can outperform softmax weights. Modular search over section structure, attribute-sets, and combination operators is empirically critical (Pouramini et al., 2024, Murthy et al., 17 Jul 2025).
Limitations: Some frameworks do not support end-to-end structure search (e.g., adding new prompt sections, dynamic reordering), and performance may depend strongly on critic LLM quality for black-box textual gradients. Joint optimization of LM weights and prompt modules, while most effective, introduces further computational demands (Sharma et al., 7 Jan 2026, Ziems et al., 6 Aug 2025, Soylu et al., 2024).

6. Future Directions and Open Challenges

Potential research frontiers opened by MPO include:

Structure-content co-optimization: Jointly learning not only prompt content but also the modular schema or topology may enable handling of domains with non-standard prompt structures or specialized pipeline workflows (Sharma et al., 7 Jan 2026, Schnabel et al., 2024).
Automated stopping, adaptive critics: Efficiently selecting when and how to halt modular updates, and dynamically choosing the most effective critic LLM for section-wise gradients, remains an open challenge (Sharma et al., 7 Jan 2026).
RL-based and continual modular adaptation: Integration of richer credit assignment, joint prompt–weight updates, and continual modular discovery for dynamic and lifelong LM programs (Ziems et al., 6 Aug 2025, Soylu et al., 2024).
Safety and bias auditing: Leveraging the traceability of modular prompt sections to detect, isolate, and address amplification of biases or unsafe behaviors.

This suggests that modular prompt optimization is foundational for scalable, interpretable, and high-performance adaptation of both static and dynamically structured LLM and VLM systems, particularly in low-data, multi-task, and pipeline settings.