Papers
Topics
Authors
Recent
Search
2000 character limit reached

Timed-LDGBA for Timed Control Synthesis

Updated 7 January 2026
  • Timed-LDGBA is a formal automata model that combines limit-deterministic structure with real-valued clocks to enforce explicit time bounds in temporal logic specifications.
  • These automata synchronize with MDP and POMDP frameworks, enabling reinforcement learning under strict time constraints and probabilistic environments.
  • MITL formulas are systematically translated into Timed-LDGBA, ensuring that all designated accepting sets are visited infinitely often to satisfy temporal obligations.

A Timed Limit-Deterministic Generalized Büchi Automaton (Timed-LDGBA) is a formal automaton model uniquely suited to represent time-bounded temporal logic specifications for control synthesis over stochastic environments. This construction combines the structural restrictions of limit-deterministic Büchi automata (LDBA) with real-valued clocks, enabling expressivity for specifying and monitoring sequences of events constrained by explicit time intervals. Timed-LDGBA are instrumental in synchronizing temporal logic specifications with Markov Decision Processes (MDPs) and Partially Observable Markov Decision Processes (POMDPs), facilitating reinforcement learning in environments with strict time-bounded requirements (Wang et al., 31 Dec 2025).

1. Formal Definition and Structure

A Timed-LDGBA is a tuple A=(Q,Σ,C,E,Inv,q0,F)A = (Q, \Sigma, C, E, \text{Inv}, q_0, \mathcal{F}) comprising:

  • QQ: finite set of locations partitioned as Q=QNQDQ = Q_N \cup Q_D, where
    • QNQ_N (nondeterministic part): admits only ϵ\epsilon-transitions and contains no accepting locations.
    • QDQ_D (deterministic part): contains all accepting locations and deterministic transitions only.
  • Σ\Sigma: finite alphabet (2Π2^\Pi for atomic propositions Π\Pi).
  • CC: finite set of real-valued clocks.
  • EQ×(Σ{ϵ})×B(C)×2C×QE \subseteq Q \times (\Sigma \cup \{\epsilon\}) \times \mathbb{B}(C) \times 2^C \times Q: set of edges. Each edge is (q,a,g,r,q)(q, a, g, r, q') where:
    • aΣ{ϵ}a \in \Sigma \cup \{\epsilon\}: transition label.
    • gB(C)g \in \mathbb{B}(C): clock guard as conjunctions xcx \preceq c or xcx \succeq c.
    • rCr \subseteq C: clocks to reset upon transition.
  • Inv:QB(C)\text{Inv}: Q \to \mathbb{B}(C): invariant conditions at each location (conjunctions of constraints).
  • q0QNq_0 \in Q_N: initial location.
  • F={F1,,Fk}\mathcal{F} = \{F_1, \ldots, F_k\}: accepting sets, each FiQDF_i \subseteq Q_D.

Limit-determinism is enforced such that all nondeterministic branching (ϵ\epsilon-moves) occurs in QNQ_N, which is acyclic and never revisited once entered QDQ_D. All acceptance monitoring in QDQ_D is strictly deterministic.

2. Generalized Büchi Acceptance Condition

Timed-LDGBA utilize a generalized Büchi acceptance mechanism to formalize satisfaction of temporal goals. For an infinite run σ=(q0,v0)t0,a0(q1,v1)t1,a1(q2,v2)\sigma = (q_0, v_0) \xrightarrow{t_0, a_0} (q_1, v_1) \xrightarrow{t_1, a_1} (q_2, v_2) \ldots reading a timed word (ai,ti)(a_i, t_i), acceptance requires:

Run is accepted    i{1,,k}, {nqnFi} is infinite\text{Run is accepted} \iff \forall i \in \{1,\dots,k\},\ \{ n \mid q_n \in F_i \} \text{ is infinite}

This ensures every generalized Büchi set FiF_i in F\mathcal{F} is visited infinitely often by the corresponding sequence of automaton states, encoding persistent timed obligations tied to the original temporal logic specification.

3. Clocks, Guards, Invariants, and Resets

Clocks provide the quantitative dimension necessary for time-bounded semantics:

  • Clock set CC: C={x1,x2,,xm}C = \{x_1, x_2, \ldots, x_m\}, each xjx_j real-valued.
  • Valuation v:CR0+v: C \to \mathbb{R}_0^+: tracks elapsed time since last reset for each clock.
  • Guards gg: conjunctions of atomic constraints, xcx \preceq c or xcx \succeq c (cNc \in \mathbb{N}).
  • Invariants Inv(q)\text{Inv}(q): conjunctions xcx \preceq c constraining the allowable time in location qq as time elapses.
  • Resets rCr \subseteq C: upon taking an edge, clocks in rr are set to zero; the valuation updates as v=v[r:=0]v' = v[r := 0].
  • Time elapse: at any location qq, duration d0d \geq 0 is permissible as long as Inv(q)\text{Inv}(q) is true at each intermediate valuation v+δv + \delta, δ[0,d]\delta \in [0, d].

4. Translation from MITL to Timed-LDGBA

Metric Interval Temporal Logic (MITL) formulas φ\,\varphi are systematically compiled into Timed-LDGBAs:

  1. Negation Normal Form & Interval Normalization: MITL formulas are normalized for transition monitoring.
  2. Monitor Construction: For each subformula ψ\psi, a "timed monitor" automaton AψA_\psi is built, typically with a single clock:
    • Example for F[a,b]πF_{[a,b]} \pi: The automaton contains:
      • Initial state q0q_0 with invariant xbx \leq b.
      • On letter π\pi and axba \leq x \leq b, reset xx and transition to qacceptq_{accept}.
      • Sink state qsinkq_{sink} if x>bx > b before π\pi.
      • Accepting set F={qaccept}F = \{q_{accept}\}.
  3. Synchronous Product: Monitors AψA_\psi are composed in product, tracking all clocks simultaneously.
  4. Limit-Determinization: All nondeterminism is grouped into initial states QNQ_N, then collapsed into deterministic QDQ_D with acceptance sets corresponding to fulfilled obligations.
  5. Pruning Unreachable States: Ensures model compactness.

Construction Example:

For φ=F[1,3]a\varphi = F_{[1,3]} a:

  • C={x}C = \{x\}.
  • Q={q0,qaccept,qsink}Q = \{q_0, q_{accept}, q_{sink}\}.
  • Inv(q0)=x3\text{Inv}(q_0) = x \leq 3; Inv(qaccept)=true\text{Inv}(q_{accept}) = \text{true}; Inv(qsink)=true\text{Inv}(q_{sink}) = \text{true}.
  • Edges include:
    • q0a,1x3,{x}qacceptq_0 \xrightarrow{a, 1 \leq x \leq 3, \{x\}} q_{accept}
    • q0any,x>3,qsinkq_0 \xrightarrow{any, x > 3, \varnothing} q_{sink}
  • q0QNq_0 \in Q_N, qaccept,qsinkQDq_{accept}, q_{sink} \in Q_D.
  • F={{qaccept}}\mathcal{F} = \{\{q_{accept}\}\}.

5. Synchronization with MDPs and POMDPs

Timed-LDGBA are synchronized with stochastic environment models to facilitate policy synthesis:

  • MDP: M=(S,A,T,s0,R,Π,L)M = (S, A, T, s_0, R, \Pi, L).
  • POMDP: M=(S,A,T,s0,R,O,Ω,Π,L)M = (S, A, T, s_0, R, O, \Omega, \Pi, L).

A product timed model M×AφM \times A_\varphi is constructed:

  • States: S×=S×Q×VS^\times = S \times Q \times V, where VV is the (discretized) space of clock valuations.
  • Actions: A×=A{ϵ}A^\times = A \cup \{\epsilon\}.
  • Transitions:
    • (s,q,v)a(s,q,v)(s, q, v) \xrightarrow{a} (s', q', v') where sass \xrightarrow{a} s' via TT, and edge (q,L(s),g,r,q)(q, L(s'), g, r, q') is enabled by v+1v+1.
    • If no edge is enabled, transition to a global sink state.
    • For ϵ\epsilon in QNQ_N: s=ss' = s, a=ϵa = \epsilon, and T×=1T^\times = 1 for the chosen ϵ\epsilon-move.
  • Reward: R×(s,q,v)=R(s)R^\times(s, q, v) = R(s) only upon entering an accepting qFiq' \in F_i; otherwise zero.
  • Observations (POMDP): Ω×((s,q,v),a,o)=Ω(s,a,o)\Omega^\times((s, q, v), a, o) = \Omega(s', a, o) if the automaton edge passes; else $0$.
  • Acceptance: A path is accepting iff its QQ-component visits each FiF_i infinitely often.

Crucially, the automaton state qq and clock valuation vv are perfectly tracked and can augment the input to Q-learning or belief trackers; in POMDPs, these quantities remain fully observable, while the base state ss is inferred by belief btb_t.

6. Application in Reinforcement Learning under Timed Constraints

MITL specifications are offline-compiled into Timed-LDGBA and synchronized online with MDP/POMDP models:

  • The reward structure enforces temporal correctness via positive reward on accepting set entry, optionally combined with performance objectives.
  • Standard RL algorithms (Q-learning, DQN) operate on the product model, learning policies to satisfy all time-bounded constraints or maximizing acceptance probability under stochasticity.
  • Evaluations in grid-world and robotics scenarios demonstrate scalability, robustness to partial observability, and faithful satisfaction of MITL constraints in learned policies (Wang et al., 31 Dec 2025).

This framework enables reliable policy synthesis in dynamic, uncertain environments where temporal obligations are explicit and time-critical.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Timed Limit-Deterministic Generalized Büchi Automata (Timed-LDGBA).