Adaptive Planner-Scheduler

Updated 22 January 2026

Adaptive planner–schedulers are integrated systems that dynamically combine planning and scheduling to optimize performance in uncertain and dynamic environments.
They employ online learning, context-aware selection, and multi-agent negotiations to efficiently resolve conflicts and improve resource utilization.
Applications range from wireless network control and HPC to ML auto-tuning and safety-critical systems, demonstrating significant empirical improvements in throughput, energy savings, and recovery times.

An adaptive planner–scheduler is an integrated decision-making system that combines planning and scheduling processes, typically leveraging online learning, context-aware selection, and resource adaptation to optimize performance across dynamic or uncertain environments. Architectures span domains such as network control, high-performance computing, safety-critical systems, neural program optimization, AI planning, and manufacturing, employing methods including reinforcement learning, graph neural inference, hierarchical control, and multi-agent negotiations. The common thread is the capacity for online or incremental adaptation, enabling conflict mitigation, efficient resource use, and rapid recovery from disruptions without full recomputation or retraining.

1. Fundamental Definitions and Design Principles

Adaptive planner–schedulers extend beyond traditional static or offline planning/scheduling by incorporating online, feedback-driven mechanisms that dynamically select, update, or configure planning modules (often specialized “planners” or xApps) in response to environmental state, observed KPIs, or external events. These systems universally comprise:

Planning Layer: Modules (e.g., xApps, RL agents, program phase analyzers) which determine candidate actions, configurations, or control policies for the wider system, often trained offline, holding scope over specific resources or tasks.
Scheduling Layer: A supervisory or meta-agent responsible for activating, deactivating, or sequencing planners in each context window, resolving conflicts, and adapting activation pools according to current system performance and operational goals.

Key definitions in adaptive planner–scheduler architectures include:

State Space $S$ : Encodes both environmental metrics (e.g., user speed, data arrival rates (Cinemre et al., 9 Apr 2025), current hardware configuration (Novaes et al., 2019), software topology (Abduljabbar et al., 2021)) and current planner metrics.
Action Space $A$ : Scheduling decisions, expressed as binary activation vectors for planners/xApps, configuration switches, or resource mapping assignments.
Reward Functions $R(s, a)$ : Evaluation by system-level KPIs such as normalized throughput (Cinemre et al., 9 Apr 2025), energy-performance product (Novaes et al., 2019), makespan or message-collision minimization (Alshaer et al., 24 Sep 2025), or coverage/solution rate in planning benchmarks (Ma et al., 2018).
Adaptivity Mechanisms: Online policy updates (e.g., via A2C, Q-learning), dynamic extension/pruning of planner/xApp pools, and conditional switching (e.g., halftime-switch in planning (Ma et al., 2018); multi-agent negotiation cycles (Tan et al., 2022)).

2. Representative Architectures and Algorithms

Multiple advanced architectures exemplify adaptive planner–scheduler paradigms:

O-RAN xApp Scheduler-Based Conflict Mitigation

Near-real-time RIC hosts multiple xApps (e.g., power allocation, resource block assignment) controlling distinct @@@@4@@@@.
A2C-based scheduler decides which xApps to activate per scheduling period, with inputs including context variables and observed KPI.
Adaptive extension: Scheduler’s action space expands/contracts as xApps are added/removed, with policy/output heads dynamically managed; baseline xApps can outperform A2C planners in certain contexts (Cinemre et al., 9 Apr 2025).

Resource-Moldable Scheduling in HPC

ARMS scheduler maintains an online cost model indexed by task type and software-topology-address (STA), selecting resource partitions to optimize locality/bandwidth per task invocation (Abduljabbar et al., 2021).
Adaptivity achieved via history-based cost updates and moldable task widths, enabling dynamic tradeoff between locality and parallelism.
“Planning” is performed by per-task cost table training; actual partition choice is finalized at runtime.

Hierarchical RL for ML Program Optimization

HARL employs multi-level hierarchical MDPs for task/sketch/modification/parameter selection in neural tensor program auto-tuning (Zhang et al., 2022).
Upper levels use sliding-window UCB bandits for task/sketch selection; lower levels perform local actor-critic exploration and adaptive-stopping.
Real-time pruning reallocates “search budget” on-the-fly, discarding low-advantage tracks.

AI-Driven Reconstruction Scheduling in Safety-Critical TTS

Offline-trained GNN or feedforward models infer temporal/spatial priorities for task execution and allocation.
Runtime reconstructors (temporal-recovery, failure-recovery, full-reconstruction) transform priorities and system context into valid, collision-free and precedence-respecting schedules (Alshaer et al., 24 Sep 2025).
Fast recovery/reset routines and stateful “recovery_vars” enable rapid adaptation and rollbacks.
Constraint and objective formalism supports makespan, workload-balance, and energy-profile optimization.

Compiler-Assisted Adaptive Scheduling for Heterogeneous Architectures

Astro uses compiler analysis to partition code into syntactically-characterized program phases, logging execution at boundaries.
Q-learning model (table or NN) maps phase+hardware states to optimal configuration actions.
Online adaptation involves runtime policy querying/system reconfiguration per interval and measured reward updates (Novaes et al., 2019).

Multi-Agent Production Plan Adaptation in Supply Chain Disruptions

Decentralized scheduling agents (material, capacity) locally optimize assignment subproblems and negotiate change proposals across BOM graph (Tan et al., 2022).
Pull/push protocol for agent-level rescheduling proceeds by iterated local MIP solves and asynchronous proposal handling, stabilizing in minutes.

3. Conflict Mitigation and Adaptivity Mechanisms

Adaptive planner–schedulers systematically address resource or policy conflicts, system disruptions, and context-dependent variability via:

Conflict Modeling: Indirect conflicts arise when planners optimize the same KPM with incompatible actions (e.g., power and resource block control yielding resource mismatches (Cinemre et al., 9 Apr 2025)); realized as resource assignment vectors with unintended zero/positive mismatches.
Scheduler Decision Logic: By regular observation of context metrics and KPIs, schedulers select planner subsets whose joint historical actions maximize throughput and minimize conflicts (e.g., avoiding joint activation in conflicting conditions).
Pool Modeling: Dynamic pool adaptation (addition/removal of planners/xApps) expands the scheduler’s activation choice space, sometimes including naive or baseline planners for context flexibility (Cinemre et al., 9 Apr 2025).
Switching Policies: In planning, halftime-switch meta-algorithms re-evaluate planner choice based on real-time predictive failure probabilities (Ma et al., 2018).
AI-Driven Recovery: Reconstruction in TTS leverages AI-inferred priorities to rapidly reorganize schedules across multiple safety and performance profiles, with rollback and context-aware recovery (Alshaer et al., 24 Sep 2025).
Multi-Agent Negotiation: Distributed, incremental MIP subproblem solution and negotiation manage disruptions locally, efficiently while preserving global feasibility (Tan et al., 2022).

4. Domain-Specific Applications and Empirical Performance

Adaptive planner–schedulers have been operationalized and empirically evaluated across diverse domains:

Wireless Network Control (O-RAN)

4 O-RUs, 16 UEs; pools of pre-trained xApps (A2C/baseline).
Scheduler-based conflict mitigation recovers up to 16% throughput loss under high-load/high-mobility versus independent xApps; expanded pool further surpasses single-xApp performance by up to 4% (Cinemre et al., 9 Apr 2025).

HPC Task Graph Scheduling

ARMS yields 1.5–2x speedup for stencil kernels, 3.5x for low-concurrency MatMul, via dynamic moldability and online cost partitioning (Abduljabbar et al., 2021).

ML Auto-Scheduling

HARL improves tensor operator performance by 22% and search speed by 4.3x over Ansor; end-to-end DNN inference speedup: ~8–9% (Zhang et al., 2022).

Safety-Critical Real-Time Systems

Parallel reconstruction across three scheduling profiles (makespan, workload, energy) scales sublinearly in runtime (<40 ms for 50 tasks on 13 threads); recovery time <16 ms (Alshaer et al., 24 Sep 2025).

big.LITTLE Systems

Astro yields up to 13% speedup, 11% energy savings on Parsec/Rodinia benchmarks; runtime overhead <3% (Novaes et al., 2019).

Supply Chain and Manufacturing

Multi-agent adaptations stabilize schedules of 50+ agents within 5–10 iterations (<10 minutes); order fulfillment ≥92%, maximum delays bounded by disruption duration (Tan et al., 2022).

5. Scalability, Extensibility, and Future Directions

Adaptive planner–scheduler frameworks exhibit scalability and extensibility through:

Dynamic Pool Management: xApp/planner pool extension enables absorption or retirement of control modules without system downtime (Cinemre et al., 9 Apr 2025).
Hierarchical Control: Multi-level scheduling and adaptive allocation of search or activation budget allow targeting of both coarse and fine resolution planning tasks (Zhang et al., 2022).
Resource Moldability: Distributed and decentralized resource management using moldable task widths or multi-agent negotiation supports linear scaling with problem instance/bill-of-material size (Abduljabbar et al., 2021, Tan et al., 2022).
Domain Generalization: Transition from history-based cost models to machine learning predictors (regression trees, hardware counters, neural networks) enhances applicability to heterogeneous systems and contexts.
Online/Real-Time Recovery: Robustness to context events—failures, mode changes, dynamic load—via stateful, AI-guided reconstruction or local negotiation enables deployment in production, safety-critical, and resource-constrained environments (Alshaer et al., 24 Sep 2025).
Integration Possibilities: Frameworks can be embedded in existing platforms (O-RAN A1 interface, Jadex agent layers next to ERP, LLVM compiler instrumentation), enabling seamless adoption in industrial and network operations (Cinemre et al., 9 Apr 2025, Tan et al., 2022, Novaes et al., 2019).

6. Technical Limitations and Research Challenges

Current limitations across adaptive planner–scheduler implementations include:

Stability assumptions in history-based models may not hold in highly non-stationary or irregular task graphs, requiring advanced smoothing or regression (Abduljabbar et al., 2021).
Pool adaptation and multi-level scheduling entails increased model complexity and tuning overhead, especially under rapid or frequent context changes.
Recovery/reconstruction models in safety-critical systems scale sublinearly for typical task counts but could require optimizations for very large or deeply-buffered environments (Alshaer et al., 24 Sep 2025).
Multi-agent models sacrifice global optimality for speed, introducing quality/recovery trade-offs (Tan et al., 2022).
Hierarchical and GNN-based scheduling faces computational overheads for large graphs; ongoing work explores sampling and more efficient embeddings (Ma et al., 2018).
Integrating fairness and multi-objective optimization is a promising but nontrivial extension, requiring redesigned reward structures and constraint handling (Abduljabbar et al., 2021).

A plausible implication is that continued advances in RL, neural program synthesis, and distributed optimization will further refine the responsiveness and domain-spanning generalization of adaptive planner–scheduler systems.