Papers
Topics
Authors
Recent
Search
2000 character limit reached

DynaMark: A Reinforcement Learning Framework for Dynamic Watermarking in Industrial Machine Tool Controllers

Published 29 Aug 2025 in eess.SY, cs.AI, cs.CR, cs.LG, and stat.AP | (2508.21797v1)

Abstract: Industry 4.0's highly networked Machine Tool Controllers (MTCs) are prime targets for replay attacks that use outdated sensor data to manipulate actuators. Dynamic watermarking can reveal such tampering, but current schemes assume linear-Gaussian dynamics and use constant watermark statistics, making them vulnerable to the time-varying, partly proprietary behavior of MTCs. We close this gap with DynaMark, a reinforcement learning framework that models dynamic watermarking as a Markov decision process (MDP). It learns an adaptive policy online that dynamically adapts the covariance of a zero-mean Gaussian watermark using available measurements and detector feedback, without needing system knowledge. DynaMark maximizes a unique reward function balancing control performance, energy consumption, and detection confidence dynamically. We develop a Bayesian belief updating mechanism for real-time detection confidence in linear systems. This approach, independent of specific system assumptions, underpins the MDP for systems with linear dynamics. On a Siemens Sinumerik 828D controller digital twin, DynaMark achieves a reduction in watermark energy by 70% while preserving the nominal trajectory, compared to constant variance baselines. It also maintains an average detection delay equivalent to one sampling interval. A physical stepper-motor testbed validates these findings, rapidly triggering alarms with less control performance decline and exceeding existing benchmarks.

Summary

  • The paper introduces an RL-based dynamic watermarking method that overcomes the limitations of static, LTI-based schemes.
  • It formulates watermarking as an MDP and employs a DDPG agent to balance control performance, energy consumption, and detection confidence.
  • Experimental evaluations on digital twins and a physical testbed demonstrate improved detection speed and reduced energy overhead.

DynaMark: Reinforcement Learning for Dynamic Watermarking in Industrial Machine Tool Controllers

Introduction and Motivation

The proliferation of networked Machine Tool Controllers (MTCs) in Industry 4.0 environments has exposed manufacturing systems to sophisticated cyber-physical threats, notably replay attacks that exploit outdated sensor data to manipulate actuators. Traditional watermarking-based detection schemes, which superimpose constant-variance Gaussian signals onto control inputs, are fundamentally limited by their reliance on linear time-invariant (LTI) and Gaussian assumptions. These static approaches are ill-suited for the time-varying, proprietary, and often nonlinear dynamics of modern MTCs, resulting in suboptimal trade-offs between detection accuracy and control performance.

DynaMark addresses these limitations by formulating dynamic watermarking as a Markov Decision Process (MDP) and leveraging reinforcement learning (RL) to adaptively select watermark covariance in real time. This framework enables the system to balance control performance, energy consumption, and detection confidence, without requiring explicit system identification or prior knowledge of plant dynamics. Figure 1

Figure 1: Flowchart of the interaction between machine tools, sensors, controllers, and the detector for real-time monitoring and control.

Problem Formulation and Theoretical Foundations

System and Attack Models

The MTC is modeled as a stochastic linear dynamic system:

yt+1=Ayt+But+wty_{t+1} = A y_t + B u_t + w_t

where yty_t is the sensor measurement, utu_t is the control input, and wtw_t is i.i.d. Gaussian noise. Watermarking is implemented by injecting a zero-mean Gaussian signal ϕt\phi_t with covariance UtU_t into the control input:

ut′=ut+ϕtu'_t = u_t + \phi_t

Replay attacks are modeled by replacing true sensor measurements with previously recorded data, while flip and injection attacks manipulate control signals and sensor readings, respectively. The residuals rt=yt−y^tr_t = y_t - \widehat{y}_t are monitored by a χ2\chi^2 detector, which triggers alarms based on statistical thresholds.

Residual Analysis and Detection Power

The paper provides rigorous analysis of residual distributions under normal operation and various attack scenarios. Under replay attacks, the test statistic gt∣τg_{t|\tau} follows a generalized χ2\chi^2 distribution, whose parameters depend on the watermark covariance and system matrices. The detection power is characterized by the Type-II error βt\beta_t, which is computed using the cumulative distribution function of the generalized χ2\chi^2 statistic.

DynaMark Framework and RL-Based Policy Optimization

MDP Formulation

DynaMark models the watermarking problem as an MDP with state st=(yt,dt)s_t = (y_t, d_t), where dtd_t is the detector's Bayesian belief in the presence of an attack. The action space consists of positive semidefinite matrices UtU_t representing watermark covariance. The reward function is designed to penalize energy consumption and control deviation, while incentivizing high detection confidence:

rt(s,a)=−ω1∥ϕt∥1−ω2∥yt+1wom−yt+1∥2+ω3∣0.5−dt+1∣r_t(s,a) = -\omega_1 \|\phi_t\|_1 - \omega_2 \|y_{t+1}^{wom} - y_{t+1}\|_2 + \omega_3 |0.5 - d_{t+1}|

RL Algorithm

A Deep Deterministic Policy Gradient (DDPG) agent is trained to optimize the watermarking policy. The actor network outputs watermark covariance, while the critic estimates the Q-value. The RL agent adapts UtU_t online based on observed system state and detector feedback, enabling dynamic trade-off management. Figure 2

Figure 2: DynaMark framework.

Experimental Evaluation

Digital Twin of Siemens Sinumerik 828D

A high-fidelity digital twin (DT) of the Siemens Sinumerik 828D controller is used to evaluate DynaMark. The DT replicates 2-axis motion control and supports replay attack scenarios. Under normal operation, DynaMark maintains low watermark energy and nominal trajectory tracking. Upon attack onset, the detector's belief dtd_t rapidly saturates, and watermark variance UtU_t is adaptively increased to maximize detection power. Figure 3

Figure 3: (a) Siemens Sinumerik 828D controller, (b) Optomec LENS® MTS 500 hybrid machine tool.

Figure 4

Figure 4

Figure 4

Figure 4: DynaMark under normal operation. (a) Detector belief dtd_t oscillates early and then falls to 0. (b) Watermark variance UtU_t rises while uncertainty is high, then levels off. (c) Resulting trajectory yty_t tracks the no-watermark baseline.

Figure 5

Figure 5

Figure 5

Figure 5: DynaMark under replay attack starting at Ï„=200\tau=200. (a) Belief dtd_t jumps to 1 almost immediately after the attack onset. (b) UtU_t is boosted by two orders of magnitude and held high. (c) Physical trajectory departs sharply from the baseline once attack started.

Benchmarking Against Constant-Variance Watermarks

DynaMark is compared to fixed-variance baselines. Under normal conditions, DynaMark achieves 70% lower watermark energy than high-variance schemes, with negligible control degradation. During replay attacks, DynaMark matches the fastest detection delay (ARL1_1 = 1 sample) while maintaining superior energy-performance trade-off. Figure 6

Figure 6

Figure 6

Figure 6: Benchmarking DynaMark against two constant–variance watermarks: (a) energy consumption and control performance under normal operation, (c) detection delay (ARL1_1) and (d) detector belief dtd_t for one representative trial under a replay attack. Results indicate DynaMark's favorable security–performance trade-off.

Figure 7

Figure 7: Trade-off between detection belief and control performance degradation as functions of constant watermark variance UtU_t. Stars mark DynaMark.

Physical Stepper-Motor Testbed

A closed-loop stepper-motor testbed is implemented to validate DynaMark in real hardware. The RL policy is transferred to the physical system via ONNX runtime. Under replay attacks, the detector's belief dtd_t rises to 1 within five samples, and DynaMark dynamically adjusts UtU_t to maintain detection power while minimizing energy overhead. Figure 8

Figure 8: Smart stepper-motor physical implementation.

Figure 9

Figure 9

Figure 9

Figure 9: The stepper-motor position under normal conditions: (a) continuous, no watermark, (b) discretized and under DynaMark's DWM, and (c) on its DT and under DynaMark's DWM. (b) and (c) show maintaining control performance across the entire motion profile.

Figure 10

Figure 10

Figure 10

Figure 10: The stepper-motor's response on DT to a replay attack, showing divergence of true position and rapid rise in detector belief.

Figure 11

Figure 11

Figure 11

Figure 11: The stepper-motor response to a replay attack with onset at decision epoch 7, showing rapid alarm and adaptive watermarking.

Comparative Analysis with Optimization-Based Baselines

Constant-variance watermarks derived from LTI approximations and LQG optimization are compared to DynaMark. On the time-varying stepper-motor plant, DynaMark achieves lower median energy and control degradation, with tighter inter-alarm intervals, demonstrating the inadequacy of static designs for non-LTI systems. Figure 12

Figure 12

Figure 12

Figure 12: Comparison results between DynaMark and five constant watermarks obtained by LTI approximation and solving the optimization problem at different LQG-cost budgets.

Implementation Considerations

  • Computational Requirements: DynaMark's RL policy inference is decoupled from real-time control via ONNX runtime, enabling deployment on resource-constrained hardware.
  • Scalability: The framework is agnostic to system order and can be extended to multi-input multi-output (MIMO) plants.
  • Adaptability: DynaMark does not require explicit system identification, making it suitable for proprietary or closed-architecture MTCs.
  • Limitations: The current implementation assumes zero-mean independent Gaussian watermarks; future work should consider state- and frequency-shaped distributions for enhanced stealth and efficiency. Figure 13

    Figure 13: Multi-strobe Online Decision-making Pipeline for DynaMark.

Implications and Future Directions

DynaMark demonstrates that RL-based dynamic watermarking can robustly detect replay attacks in industrial controllers, outperforming static and optimization-based schemes, especially in non-LTI and time-varying environments. The framework's adaptability and model-free operation are critical for deployment in proprietary industrial systems. Future research should explore safe RL constraints, online watermark recovery for autonomous system restart, and advanced watermark shaping to counter adaptive adversaries.

Conclusion

DynaMark provides a principled, RL-driven approach to dynamic watermarking for industrial MTCs, achieving efficient replay attack detection with minimal control performance degradation and energy overhead. Its model-free, adaptive design overcomes the limitations of static and LTI-dependent schemes, offering a practical solution for securing cyber-physical manufacturing systems against evolving threats.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.