Papers
Topics
Authors
Recent
Search
2000 character limit reached

Modified Dynamic Programming Algorithm

Updated 25 December 2025
  • Modified dynamic programming algorithms are adaptations of classical DP, designed for high-dimensional, interdependent recovery and optimization challenges.
  • They integrate approximate dynamic programming with simulated annealing to efficiently simulate rollout decisions in complex disaster recovery scenarios.
  • Empirical studies show these methods can boost recovery performance by 15–25% compared to baseline heuristics in infrastructure restoration.

A modified dynamic programming algorithm refers to any substantial deviation from traditional dynamic programming (DP) schemes, typically intended to address scalability, combinatorial complexity, or domain-specific requirements in stochastic control and resource allocation. Notable advances have arisen to enable near-optimal solutions in high-dimensional, highly-interconnected, or uncertain recovery and optimization problems. Recent developments—such as those detailed in Nozhati et al.'s community-level post-disaster restoration framework—integrate approximate dynamic programming (ADP) with metaheuristics like simulated annealing (SA) for computationally feasible, near-optimal recovery planning (Nozhati et al., 2018).

1. Formulation of the Modified DP Problem

The problem addressed by modified dynamic programming in (Nozhati et al., 2018) is the restoration scheduling of interdependent infrastructure networks (e.g., power, water, transportation, food retailers) following large-scale disasters. Time is naturally discretized into epochs t=1,2,,Tt=1,2,\ldots,T. At each epoch, NN repair crews can be assigned to components in the current damaged set DtD_t, selecting actions XtPN(Dt)X_t \in P_N(D_t), the set of all NN-element subsets of DtD_t. The process terminates when DT+1=D_{T+1} = \emptyset.

The restoration objective is to maximize the cumulative number of benefited people per unit time—a function of utility restoration and access to food retailers. Let hth_t be the population benefited immediately after action XtX_t, and ktk_t the cumulative time to reach epoch tt. The cumulative reward is

F(X)=t=1ThtktF(X) = \sum_{t=1}^{T} \frac{h_t}{k_t}

The optimal restoration policy is the sequence X=argmaxXF(X)X^* = \arg\max_{X} F(X).

The Bellman-form cost-to-go, in conventional notation, is

Jk(x1,,xk1)=minxkJk(x1,,xk1,xk)J_k(x_1, \ldots, x_{k-1}) = \min_{x_k} J_k(x_1, \ldots, x_{k-1}, x_k)

which is recursively defined over the sequence of component repair decisions.

2. Rollout Value Function Approximation

For large damaged networks, explicit cost-to-go functions are intractable due to combinatorial explosion (PN(Dt)|P_N(D_t)| grows rapidly). The algorithm uses a one-step "rollout" approximation: for each candidate action xx at stage kk, simulate the downstream effect by "rolling out" a base heuristic HH for subsequent decisions. In this context, HH is a random (uniform) assignment of repair crews to damaged components. For each trial xx, HkH_k computes the simulated cumulative reward FF under the induced trajectory. The selected action is

xkargminxHk(x1,...,xk1,x)x_k \in \arg\min_x H_k(x_1, ..., x_{k-1}, x)

This approach ensures that the policy performs at least as well as the base heuristic.

Exhaustive search over PN(Dt)P_N(D_t) is intractable for realistic Dt|D_t|, so a constrained candidate subset UˉtPN(Dt)\bar{U}_t \subset P_N(D_t) is defined. At each decision epoch, the candidate set is initialized at random and iteratively refined using simulated annealing (SA). The SA iteration proceeds:

  • At each step, propose a neighbor (swap out/replace a component in Sn1S^{n-1}).
  • Evaluate each candidate SS' by a rollout simulation to obtain f(S)f(S').
  • Accept SS' over the prior candidate SS probabilistically:

Paccept={exp(ΔfkBT),Δf=f(S)f(S)>0 1,Δf0P_{\text{accept}} = \begin{cases} \exp\left(-\frac{\Delta f}{k_B T}\right), & \Delta f = f(S') - f(S) > 0 \ 1, & \Delta f \leq 0 \end{cases}

where TT is the current annealing "temperature" and kBk_B is Boltzmann's constant.

  • The temperature TT is reduced via a cooling schedule (e.g., Tn+1=αTnT_{n+1} = \alpha T_n or Tn=T0/log(1+n)T_n = T_0/\log(1+n)). The only requirement is "sufficiently slow" cooling to enable broad search and later convergence.

The SA is run for KK iterations per decision epoch; the repaired components XtX_t are set to the final candidate.

4. Algorithm Workflow and Pseudocode

The high-level pseudocode for the ADP+SA algorithm is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
t = 1
while D_t != empty:
    S = random_initial_candidate(P_N(D_t))
    T = T_0
    for n in range(K):
        S_prime = random_swap_neighbor(S, D_t)
        f_S_prime = rollout_simulation(H, S_prime)
        delta_f = f_S_prime - f_S
        if delta_f <= 0 or rand() < exp(-delta_f / (k_B * T)):
            S = S_prime
        # else: keep S unchanged
        T = alpha * T  # or implement alternative schedule
    X_t = S
    apply_repairs(X_t)
    update D_{t+1}
    t += 1
return (X_1, ..., X_{t-1})

This workflow strictly restricts the number of rollout evaluations, improving tractability. The structure enables adaptation for other domains where infeasibility of exhaustive enumeration and high stochasticity preclude exact DP.

5. Parameterization, Complexity, and Convergence Analysis

Key parameters are:

Parameter Interpretation
N Number of available repair crews per epoch
K Simulated annealing iterations per epoch
T_0 Initial annealing temperature
alpha Cooling rate (if exponential schedule used)
k_B Boltzmann constant (scales acceptance in SA)
H Rollout base heuristic (random in this case)

The per-epoch cost is O(Kcostrollout)O(K \cdot cost_{rollout}), as each candidate requires a full rollout to simulate its downstream reward. Compared to exhaustive enumeration which is O(PN(Dt)costrollout)O(|P_N(D_t)| \cdot cost_{rollout}), this is a substantial reduction.

Regarding convergence, rollout approximation guarantees the derived policy's expected performance is at least as good as that of the base heuristic. Simulated annealing, if run with an infinitely slow cooling schedule, converges to a global optimum over the candidate set; in practice, finite KK yields near-optimal solutions with manageable computation.

6. Case Study and Empirical Performance

In the tested case of a simulated magnitude 6.9 earthquake in Gilroy, CA, interdependent restoration of power, water, bridges, and food retailers was optimized using the modified DP algorithm. The key metrics were the number of "food-secure" people over time and the cumulative reward F(X)F(X) (area under the benefit-time curve).

Results demonstrated:

  • The ADP+SA policy achieves faster recovery (steeper benefit-time curve) compared to the baseline random policy.
  • The normalized area under the ADP+SA curve exceeded the base by approximately 15–25% across multiple damage scenarios.
  • Reward histograms showed that all rollout+SA runs outperformed the base policy in final outcomes.

This indicates the hybrid ADP+SA produces high-quality, computationally feasible restoration schedules, with significantly improved solutions compared to simple heuristics, without the exponential computational burden of exact DP (Nozhati et al., 2018).


References

Nozhati, S., et al. "A Modified Approximate Dynamic Programming Algorithm for Community-level Food Security Following Disasters" (Nozhati et al., 2018)

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Modified Dynamic Programming Algorithm.