Dynamic P/D Ratio Adjustment in Adaptive Systems

Updated 29 December 2025

Dynamic P/D Ratio Adjustment is a strategy that dynamically balances competing processes in systems like LLM serving, deep metric learning, and adaptive game AI.
It employs real-time feedback control and adaptive thresholding to optimize trade-offs between throughput, stability, and convergence across various applications.
Empirical evidence shows significant performance gains, including up to 60% higher throughput in LLM systems and enhanced metric learning and optimization stability.

Dynamic P/D Ratio Adjustment refers to a family of strategies for real-time adaptation of the proportion or thresholds governing two critical processes or roles—variously denoted as Prefill/Decoding (P/D), Performance/Difficulty (P/D), or Positive/Negative (P/N) selection—across diverse problem domains. These domains include LLM serving systems, deep metric learning, stochastic optimization, and adaptive AI in games. In all cases, the central aim is continual alignment of competing resources or objectives to optimize throughput, stability, robustness, or user engagement.

1. Theoretical Foundations and Definitions

Dynamic P/D Ratio Adjustment is instantiated in several contexts, each with precise operational definitions:

LLM Serving Systems: The P/D ratio is defined as $n_p / n_d$ , where $n_p$ is the number of prefill instances and $n_d$ is the number of decoding instances. This ratio is dynamically tuned to balance throughput and latency constraints, given variable request patterns and prompt complexities (Jin et al., 2024).
Deep Metric Learning: The P/N ratio (editor's term: dynamic P/D analog) describes the proportion of positive to negative sample pairs mined during batch construction. Adaptive thresholding (e.g., AT-ASMS) dynamically adjusts mining criteria to balance pair ratios, improving optimization stability and performance (Jiang et al., 2024).
Adaptive Optimization: In heavy-ball momentum schemes such as Dyna, the ratio of the momentum (P) to damping (D) terms is controlled via the time-varying damping ratio $\zeta_t$ , shifting the optimization regime between fast traversal and stable convergence (Han, 2018).
Adaptive AI in Games: Player performance $P(x, t)$ and opponent difficulty $D$ are dynamically coupled via a feedback loop, where the difference in rate of performance change steers adjustments of discrete difficulty modes to maintain player engagement and balanced challenge (Silva et al., 2017).

2. Control Logic and Algorithms

Dynamic adjustment mechanisms are grounded in feedback control, employing discrete or continuous monitoring of key metrics:

LLM Serving (P/D-Serve): Dynamic ratio adjustment combines offline micro-benchmark-based profiling with online adaptive autoscaling. Profiling seeks the ratio $(n_p, n_d)$ satisfying $n_p b_p (1/T_p) \approx n_d b_d (1/T_d)$ . Online, the system monitors the proportion $\alpha = T_p / (T_p + T_d)$ and E2E latency trends; if prefill (P) dominates, $n_p$ increases. Decisions update the RDMA topology via RoCE maps (Jin et al., 2024).
Deep Metric Learning (DDTAS): The AT-ASMS algorithm calculates the mined pair ratio $R = n_- / n_+$ . Thresholds $(\gamma_+, \gamma_-)$ are dynamically updated as $\hat\gamma_+ = \gamma_+ + \kappa\gamma_+\sigma(R)$ , $\hat\gamma_- = \gamma_- - \kappa\gamma_-\sigma(R)$ . A meta-learning loop further adapts the loss margin (Jiang et al., 2024).
Dyna Optimization: The optimizer parameterizes step size $\alpha_t$ and momentum $\beta_t$ explicitly as functions of the schedule $\zeta_t$ :

$\alpha_t = \frac{h^2}{m + \zeta_t h \sqrt{m k}}, \quad \beta_t = \frac{m - \zeta_t h \sqrt{m k}}{m + \zeta_t h \sqrt{m k}}$

Increasing $\zeta_t$ dynamically increases damping and shifts the P/D balance toward stability (Han, 2018).

Game AI (MOBA DDA): Every $\Delta t = 15$ s, the system computes $\alpha(t) = \Delta P(x, t) - \Delta P(y, t)$ and compares to a threshold $\beta$ . $D_y$ (difficulty mode) increases or decreases by one step if $\alpha$ exceeds $\pm\beta$ ; otherwise it remains unchanged, preserving challenge parity (Silva et al., 2017).

3. Fine-Grained Resource Grouping and Isolation

In high-throughput LLM serving, dynamic P/D ratio control is enabled by grouping compute resources into fine-grained “P/D groups” for scenario-specific optimization. Each group consists of homogeneous request patterns (e.g., similar prompt lengths), maximizing cache reuse and minimizing performance variance:

Domain	Resource Grouping	Purpose
LLM Serving	P/D groups via RoCE	Reduce cross-traffic mismatch; optimal scaling
Deep Metric Learning	Mini-batch pair mining	Adaptive thresholding per batch
Adaptive Optimization	Layer- or param-wise	Per-layer step size adaptivity
Game AI	Player/opponent tracking	Maintain engagement, conceal adaptation

This segmentation ensures each group can independently optimize its P/D or analogous ratio in response to local dynamics without global contention (Jin et al., 2024, Jiang et al., 2024).

4. Experimental Outcomes and Validation

Dynamic P/D ratio adjustment has demonstrated significant empirical gains in diverse settings:

LLM Serving: Dynamic ratio adjustment delivers up to 60% higher throughput and 42% improvement in TTFT SLO adherence compared with fixed-ratio baselines; end-to-end design yields a 6.7× total throughput increase over undifferentiated serving (Jin et al., 2024).
Metric Learning: DDTAS (comprising AT-ASMS plus meta-learned margins) raises Recall@1 by 2–2.7 pp and NMI by 1.7–2.3 pp on CUB200 and Cars196; no grid search or manual thresholding required (Jiang et al., 2024).
Optimization: In Dyna, varying the damping ratio $\zeta_t$ enables a schedule from rapid exploration to robust convergence. Early under-damped phases (P ≫ D) yield swift traversal; later stages damp oscillations and stabilize around critical points (Han, 2018).
Game AI: Adaptive difficulty resulted in balanced win/loss rates (80–90% of matches) and user engagement metrics (none detected AI adjustment) (Silva et al., 2017).

5. Implementation Considerations and Trade-Offs

The practical deployment of dynamic P/D adjustment introduces domain-specific challenges:

Monitoring Overhead: In LLM serving, profiling for optimal ratios is amortized over scenario updates; per-window online metrics incur negligible cost (Jin et al., 2024).
Control Complexity: Queue removal in LLM systems transfers retry/connection logic to the gateway. In DDTAS, extra per-batch computations are bounded and offset by gains in sample efficiency (Jiang et al., 2024).
Resource Fragmentation: Fine-grained isolation introduces memory and metadata overhead (e.g., HBM for RDMA maps in LLM serving, buffer reservation for D2D transfer).
Parameter Tuning: Dyna exposes three physical hyperparameters (mass $m$ , time-step $h$ , damping schedule $\zeta_t$ ); their selection controls the P versus D regime during optimization (Han, 2018).
Adaptation Granularity: In all domains, fine control of adjustment frequency and thresholding is essential to avoid instability or oscillations.

6. Applications and Broader Implications

Dynamic P/D Ratio Adjustment principles generalize to any domain requiring real-time balancing of dual processes with competitive or complementary roles. The canonical applications are:

LLM Inference/Serving: Autoscaling decoder/prefill resources to handle fluctuating demand, diverse prompt shapes, and strict SLOs in distributed tensor-parallel architectures.
Metric Learning: Automating hard-example mining to maintain stable positive/negative class ratios as data distributions evolve during training.
Momentum-Based Optimization: Scheduling the exploration/exploitation trade-off by explicit control over momentum and damping terms.
Adaptive Game AI: Maintaining user engagement by tuning challenge in response to observed player skill trajectories.

The unifying theoretical contribution is the recasting of system adaptation as an online optimization of the P/D operating point, based on streaming feedback of performance, workload, or user response. This approach underlies significant performance, stability, and engagement improvements across algorithmic and interactive systems (Silva et al., 2017, Han, 2018, Jiang et al., 2024, Jin et al., 2024).