Random Capacity-Feasible Dynamic Policy

Updated 28 January 2026

Random Capacity-Feasible Dynamic Policy is a dynamic strategy that optimizes system rewards while rigorously enforcing instantaneous and cumulative capacity constraints.
It leverages methodologies such as threshold rules, policy iteration, and action masking to manage uncertainties in demand and random capacities.
The approach underpins applications ranging from cloud resource allocation to adaptive scheduling and treatment allocation in complex stochastic environments.

A random capacity-feasible dynamic policy is a non-anticipative strategy designed to optimize some reward (or minimize cost) in a stochastic, time-evolving system while guaranteeing that all instantaneous and/or cumulative capacity constraints—potentially random themselves—are never violated. Such policies are central in stochastic control, dynamic discrete optimization, constrained MDPs, online algorithms, and applied stochastic systems where demand, arrivals, or capacities are random. Capacity feasibility is always enforced, either by construction of the policy class or through real-time projection or masking. This concept underpins provably optimal dynamic control protocols in settings as diverse as resource allocation under Brownian uncertainty, stochastic programs with operational constraints, treatment allocation under budget caps, and online greedy selection. The following sections detail formal models, structural properties, algorithmic construction, randomized policy classes, and theoretical guarantees.

1. Fundamental Principles and Formal Models

Random capacity-feasible dynamic policies arise across several canonical frameworks. Consider the following representative formalizations:

(a) Stochastic Control with Instantaneous Constraints

In bounded-velocity stochastic control for continuous-time resource allocation, one adjusts resource capacity $P(t)$ to meet stochastic demand $D(t)$ , modeled as Brownian motion: $dD(t) = b\,dt + \sigma\,dW(t)$ Control $u(t) = dP/dt$ is restricted to $u(t) \in [\theta_\ell, \theta_u]$ for physical actuation feasibility; the system always guarantees that at each $t$ , the sum of capacities (primary plus overflow) can meet realized demand. Feasibility is enforced at every instant by construction—no capacity overrun is possible and all demand is always served by some mix of resources, primary or secondary (Gao et al., 2018).

(b) Constrained Markov Decision Processes (CMDPs)

For discrete CMDPs, capacity takes the form of an expected discounted cost threshold: $J^\pi(s) = \mathbb{E}^\pi\left[ \sum_{t=0}^\infty \beta^t c(s_t, a_t) \,\big|\, s_0 = s \right] \leq C$ The feasible policy set comprises all stationary (or randomized) policies satisfying the constraint uniformly for all $s$ . Dynamic programming recursions and policy-iteration methods are designed to avoid any violation of this constraint at any reachable state, yielding policies whose discounted resource use never exceeds $C$ , regardless of realized trajectory (Chang, 2023).

(c) Online, Adaptive, and Greedy Policies

In online resource allocation or stochastic knapsack models (e.g., submodular maximization or sequential acceptance under unknown or random capacities), a policy adaptively accepts or rejects requests, always querying a feasibility oracle or precomputing thresholds that ensure no action can exceed current capacity—even when capacity is uncertain or revealed only at overflows (Kawase et al., 2018, Jiang et al., 2020).

The governing principle in all cases is strict enforcement, by either state-dependent rules or randomized action masking, that guarantees compliance with all capacity constraints across all random system realizations.

2. Policy Construction Techniques

(a) Threshold and Bang–Bang Rules

Continuous and discrete-time stochastic control problems frequently admit threshold-type or bang–bang optimal policies:

For Brownian-driven resource allocation, there exist explicit thresholds $L \leq U$ $L \leq U$ such that:
- If $X(t) < L$ , maximally increase capacity;
- If $L \leq X(t) \leq U$ , hold capacity;
- If $X(t) > U$ , maximally decrease capacity.
- Feasibility is guaranteed by velocity bounds; overflow or holding costs are balanced against adjustment rates (Gao et al., 2018).

(b) Policy Iteration with Uniform Feasibility

CMDPs employ policy-iteration algorithms in which actions are restricted at every state to those whose expected cumulative cost does not cause $J^\pi(s)$ to breach the threshold. Instead of unconstrained improvement, each greedy step solves: $\max_{\mu \in \Delta(A(s))} \mathbb{E}_{a \sim \mu}[ \text{reward} ] \quad \text{subject to}\quad \mathbb{E}_{a \sim \mu}[ \text{cost} ] \leq J^\pi(s)$ Randomization within feasible actions allows for convexification and optimal exploitation of the capacity envelope (Chang, 2023).

(c) Stochastic Program Decomposition

In multistage stochastic programming with resource and reservoir bounds, policy construction (e.g., by SDDP) generates piecewise-linear value approximations and policy mappings that always satisfy linear capacity and balance constraints almost surely, for every realization of the underlying randomness (Hole et al., 2023).

(d) Action Masking and Soft Constraints

In RL-based environments (e.g., dynamic job-shop scheduling with random arrivals and breakdown-induced effective capacity), masking approaches restrict action probabilities at each state to feasible transitions—either by overwriting logits (hard mask) or penalizing (via negative gradients) policies that place mass on infeasible actions (soft mask). This guarantees that the learned policy is capacity-feasible in training and inference (Lassoued et al., 14 Jan 2026).

(e) Adaptive/Randomized Online Policies

For submodular and classic knapsack problems with unknown capacity, adaptive or universal randomized policies construct packing or selection sequences in which, at every step, trial actions are tested for oracle feasibility or thresholded—immediately canceling or discarding any violating action. Robustness ratios quantify approximation to offline optimal reward while strictly ensuring no overpack (Kawase et al., 2018).

3. Randomization and Convexification

Randomized dynamic policies serve several roles:

Softmax/logistic policy classes provide tunable selectivity in constrained treatment allocation (soft randomized selection of "treat" vs "do not treat" given capacity/remaining budget) (Adusumilli et al., 2019).
In index-based policies for multi-resource allocation, randomization resolves index ties or select among patterns to better exploit convex combinations, leading to asymptotic optimality and smooth adjustment near capacity fronts (Fu et al., 2018).
Randomized universal or adaptive policies in submodular/probabilistic packing formally maximize expected value over capacity uncertainty. Under certain models, no deterministic policy can provide nontrivial robustness for all capacities—randomization is necessary for positive approximation guarantees (Kawase et al., 2018).

4. Feasibility Assurance and Theoretical Guarantees

All approaches share the property of strict, non-stochastic feasibility: at every decision, no capacity violation can occur. This is enforced by:

Direct feasibility checks or oracles (adaptive knapsack/policy masking)
Hard constraints in Bellman/DP recursions (CMDPs, stochastic programming)
Construction of the feasible action set at each state (policy-iteration, RL with masking)
Adaptive thresholds or stochastic indices (online resource allocation, index policies)

Theoretical properties include:

Uniform and local cost or capacity feasibility for all reachable states (Chang, 2023)
Contraction mappings or smooth-fit equations yielding globally or locally optimal policies within the feasible set (Gao et al., 2018, Chang, 2023)
Performance bounds: $O(n^{-1/2})$ regret in plug-in RL treatment allocation (Adusumilli et al., 2019), $O(\log T)$ regret for online threshold policies (Jiang et al., 2020), constant-factor robustness in stochastic knapsack (Kawase et al., 2018)
Asymptotic optimality for randomized index policies in large-scale, multi-resource systems (Fu et al., 2018)

5. Applications and Practical Contexts

Random capacity-feasible dynamic policies have seen implementation and analysis in:

Real-time dynamic allocation of primary and secondary resources under Brownian demand fluctuations (e.g., datacenter bandwidth, cloud resource allocation) (Gao et al., 2018)
Renewable energy systems combining multistage operational and investment decision-making under random inflows and storage (Hole et al., 2023)
Job-shop and manufacturing scheduling under nonstationary, bursty arrivals and stochastic machine failures; action masking enables RL agents to respect operational and breakdown-induced capacities (Lassoued et al., 14 Jan 2026)
Sequential economic experiments and treatment roll-out (randomized clinical trial personalization) with hard budget or cohort constraints (Adusumilli et al., 2019)
Networked or parallel server systems where random, server-dependent capacities and batch arrivals interact (e.g., Persistent-Idle policies) (Atar et al., 2021)
Resource-constrained online decision-making in technological infrastructure, digital marketing, and cloud platforms (Jiang et al., 2020)
Online and adaptive submodular instant-packing with unknown constraints (Kawase et al., 2018)

In each context, the essential property is that the policy’s structural or algorithmic design guarantees real-time feasibility for every realized capacity path, even in the face of nontrivial system stochasticity.

6. Structural and Algorithmic Table

Method	Stochastic Inputs	Feasibility Mechanism
Bounded-Velocity Stochastic Ctrl	Brownian demand	Velocity bounds, secondary resource
CMDP DP/PI	Transition, reward/cost	Restricted action sets, policy iter.
SDDP in Stochastic Prog.	Inflows, storage	Linear capacity constraints
RL with Action Masking	Arrivals, failures	Mask at each state
Adaptive Threshold (Online)	Reward/size draws	Per-period truncation, oracle test
Restless Bandit/Index Policies	Arrival/service rates	Indexing, relaxation, guardbands

7. Theoretical and Computational Advantages

Capacity-feasible policies combining randomization and dynamic state feedback achieve the following:

Provable optimality or near-optimality (under specified scaling or regularity assumptions)
Uniform or localized capacity guarantee at all times and states
Scalability to high-dimensional and/or nonstationary, nonconvex stochastic control problems via structural decomposition, policy classes, or modern RL/PPO frameworks with action mask logic
Practical implementability in both model-based (DP, LP, SDDP) and data-driven (RL, bandit) environments
Measurable robustness and regret bounds, with quantifiable performance gap to the offline or unconstrained optimum (Gao et al., 2018, Chang, 2023, Jiang et al., 2020, Adusumilli et al., 2019, Kawase et al., 2018, Fu et al., 2018, Atar et al., 2021, Lassoued et al., 14 Jan 2026, Hole et al., 2023)

In summary, the defining feature of a random capacity-feasible dynamic policy is strict, real-time adherence to capacity constraints in the presence of randomness—achieved via a diverse array of theory-backed algorithmic strategies and validated by uniform feasibility proofs, performance bounds, and broad empirical deployment in stochastic optimization domains.