- The paper introduces a novel method that uses an output-only autoregressive model with matrix zonotopes to over-approximate output reachable sets in LTI systems.
- It integrates a Transformer surrogate to bypass conservative zonotope multiplication and tightens predictions via supervised label contraction.
- Experimental results show that the Transformer achieves mean hull widths only 1.5–1.7× the model-based lower bound while ensuring 100% empirical coverage.
This work addresses the output reachability analysis for discrete-time LTI systems with completely unknown dynamics, only requiring noisy input-output trajectories and knowledge of system order. Specifically, neither the state-space matrices (A,B,C) nor the observation map are accessible. The central technical challenge is the computation of outer approximations to the output reachable set, denoted Yk​, under both structural and disturbance uncertainty.
The approach leverages the Cayley-Hamilton theorem to derive an output-only autoregressive representation, effectively eliminating the latent state. This results in a lifted input-output model where all parameter uncertainty is encapsulated in a matrix zonotope, constructed solely from data given a bound on the aggregated input-output residuals. Formal reachability sets are then over-approximated using set-valued matrix zonotope propagation. However, in contrast to classical state-space data-driven reachability, the required regressor dimension and parametric set size expand dramatically, inducing severe structural conservatism that is not addressable via order reduction or improved set representations.
Data-Driven Output Reachability via Matrix Zonotopes
The core data-driven reachability algorithm proceeds by (1) constructing the matrix zonotope parameter set using input-output liftings and bounding aggregated residuals, and (2) propagating reachable sets via affine image and Minkowski sum operations in the zonotope domain. The essential guarantees are:
- Deterministic containment: For any horizon k≥no​, the true output set is contained in the propagated zonotopic over-approximation, i.e., Yk​⊆Y^​k​, for all admissible initial conditions, inputs, and noise.
This guarantee is unconditional with respect to model knowledge but comes with inherent conservatism: for nontrivial system order and output dimension, the matrix zonotope multiplication step results in rapidly growing uncertainty, as demonstrated in all examined sensor configurations.


Figure 1: Data-driven output reachable sets under the cross-block sensor configuration (Ca​), highlighting outer over-approximation from the data-driven approach.
To address the irreducible conservatism intrinsic to matrix zonotope-based propagation, the paper introduces a supervised learning-based surrogate:
- Tightened training labels: During offline training, external non-reachability certificates (e.g., derived empirically from observed output bounds) are used for directional contraction of the formal zonotopic envelope, yielding considerably tighter—but still valid—training sets.
- Decoder-only Transformer predictor: Rather than recursively propagating highly conservative reachable sets, a Transformer is trained to directly map fixed-length context windows of past outputs (tokenized zonotopes) to predictions of future output reachable sets. This bypasses the structural over-approximation of matrix zonotope multiplication.
- Calibration with split conformal prediction: Coverage guarantees are restored by conformal prediction, which empirically calibrates the minimal set inflation required to ensure, at test time, 1−δ coverage of realized output trajectories, regardless of the underlying data distribution.
These ingredients combine to produce a system for sequential output reachable set prediction that retains the theoretical safety of set-valued methods while achieving substantial reductions in over-approximation.


Figure 2: Transformer-enhanced output reachable sets for the cross-block configuration (Ca​), showing significantly reduced conservatism and calibrated high-probability containment of actual trajectories.
Numerical Results
Experiments are conducted on a five-dimensional LTI system subject to various nontrivial sensor/output configurations. Key aspects of the evaluation include:
- Uncertainty quantification: The data-driven approach is shown to yield overly conservative sets—often an order of magnitude larger than the model-based reference—even with perfect residual bounds.
- Transformer performance: The Transformer, trained with tightened labels and calibrated via conformal quantiles, produces reachable sets whose mean hull width is only $1.5$–1.7× the model-based lower bound, compared to $6$–Yk​0 for classical data-driven results.
- Coverage: Empirically, the conformally-calibrated Transformer achieves Yk​1 empirical coverage on all held-out test trajectories, matching the prescribed statistical guarantee.
These results remain consistent for all tested output configurations. Any attempt to bypass conformal calibration or label tightening (e.g., by directly fitting the original DD sets or omitting the certificate) is observed to forfeit tightness or formal guarantees.
Theoretical Implications and Extensions
Strong claims and analytical limits:
- Coverage guarantee: The main theoretical claim is the restoration of Yk​2 trajectory-point or joint coverage for realized outputs, even without information on the true reachable set's shape—a property not enjoyed by purely neural or non-statistical set propagation approaches.
- Structural conservatism is intrinsic: The analysis shows that all standard remedies (zonotope order reduction, more expressive set representations, increased data) are fundamentally unable to mitigate the parameter growth and over-approximation introduced by the lifted autoregressive model.
Practical and theoretical implications:
- Model-agnostic safety certificates: The framework enables certified reachability analysis for systems where no state estimate, system identification, or parametric model is available—a notably stronger assumption-free setting than in previous data-driven set-based reachability.
- Transformer as a set-predictor: The architecture demonstrates that modern sequence models can learn to predict conservative yet accurate set-valued quantities when trained on certificate-tightened labels, with implications for neural surrogate design in robust control and verification.
Future directions include:
- Extensions to nonlinear dynamical systems and data-driven nonlinear autoregressive representations
- Integration of adaptive or context-sensitive conformal quantiles for time-varying or non-stationary environments
- Exploitation of alternative geometric representations for uncertain outputs within the Transformer decoding stack
Conclusion
This work advances output-only reachability analysis for LTI systems with total structural uncertainty by (1) providing deterministic over-approximation guarantees via output-lifted matrix zonotopes, (2) leveraging Trained Transformers for autoregressive set sequence prediction, and (3) restoring distribution-free statistical coverage via conformal calibration. The combination achieves a notable reduction in conservatism without compromising formal guarantees, establishing a new technical baseline for model-agnostic data-driven reachability with quantifiable safety properties, and providing a blueprint for extending data-driven certification to more complex, partially observed, and nonlinear dynamical settings.
Reference: "Transformer-Enhanced Data-Driven Output Reachability with Conformal Coverage Guarantees" (2604.02173)