CD-TWINSAFE: V2I Digital Twin for Autonomous Safety

Updated 25 January 2026

CD-TWINSAFE is a vehicle-to-infrastructure digital twin architecture that mirrors on-board sensor data and high-fidelity UE5 simulations to assess scene safety.
It couples dual perception pipelines—using YOLOv8-n, RAFT-Stereo, and ROI-segmentation—with continuous safety metric computations like TTC and THW at 20 Hz.
The platform leverages ROS2 and UDP-based V2I networking to achieve sub-40 ms latency, enabling synchronized risk classification and operator-triggered safety alerts.

CD-TWINSAFE is a vehicle-to-infrastructure (V2I) enabled digital twin architecture designed to support scene understanding and real-time safety assessment for autonomous vehicles. The system tightly couples on-board sensor-driven perception and localization modules with a high-fidelity Unreal Engine 5 (UE5) simulation, interconnected via ROS 2 and UDP-based links over 4G. The infrastructure-side digital twin operates in synchrony with the vehicle, enabling live mirroring, risk classification, and operator-injected safety alerts within a collaborative ROS environment (Khaled et al., 18 Jan 2026).

1. Architectural Components

CD-TWINSAFE consists of two fully concurrent processing stacks:

On-Board Driving Stack:

Localization Module: Utilizes vehicle throttle and gear state to estimate the longitudinal velocity $V_t$ at each 20 Hz frame. The outputs are egovehicle velocity $V_{ego}$ and pose, used for transforming perception outputs into the world reference frame.
Perception Module: Implements 20 fps inference (stereo camera ZED 2, 672×376 px @10 Hz neural-depth, 119.89 mm baseline), divided into multiple stages:
1. Object Detection & Classification: Employs YOLOv8-n for detection and ByteTrack for multi-object tracking, producing 2D bounding boxes and class labels ("Car", "Pedestrian").
2. Depth Estimation: RAFT-Stereo computes disparity $d(u,v)$ , enabling depth $Z(u,v) = f \cdot B / d(u,v)$ .
3. Kinematic Estimation: Object velocities $v_{obj}$ and yaw angle quantization are derived via:
$v_{obj} = - (v_x \cdot X + v_z \cdot Z) / \sqrt{X^2 + Z^2}$

Yaw is estimated from vertical and horizontal depth differences ( $\Delta_{vert}, \Delta_{horiz}$ ), discretized into $\{0^\circ,90^\circ,180^\circ,270^\circ\}$ . 4. Safety Features Extraction: Calculation of time-to-collision (TTC) and time-headway (THW) metrics.

Digital-Twin Stack:

Unreal Engine 5-based live 3D replication.
Spawner Actors: Dynamically add/update UE actors from incoming ROS2 messages (ego vehicle modeled as Pawn for operator POV, UI for alerts).
Coordinate Transformation: Applies the Z–X–Y rotation matrix $R_{zxy}(\phi, \theta, \psi)$ and offset equation:

$POS_{Object} = POS_{Ego} - R_{zxy} \cdot [x_{rel}\ y_{rel}\ z_{rel}]^T$

Visualization Pipeline: Parses UDP-received byte-encoded ROS2 messages, updating the UE scene at ~20 Hz.
Digital-Twin UI: Displays detected object meshes and ego state; supports operator-driven alerts back to vehicle.

2. ROS2 Messaging and V2I Networking

The CD-TWINSAFE platform utilizes custom ROS 2 message definitions for inter-stack communication over UDP sockets:

EgoPose.msg: Contains position (lat, lon, alt), orientation (roll, pitch, yaw), and speed of the ego vehicle.
DetectedObj.msg: Encodes object track ID (ByteTrack), class label, relative coordinates, yaw, relative/absolute speed, distance, TTC, and THW.
SafetyAlert.msg: Transports operator-typed alerts with relevant ID, text, and risk level (safe, hazardous, dangerous).

All message fields follow semantic conventions:

Time synchronization via ROS header.stamp
Relative positions in meters (x_rel, y_rel, z_rel)
Safety metrics in seconds (ttc, thw)
alert_level values driving visual UI cues

V2I communication leverages outbound UDP over a 4G modem (behind CG-NAT), with message serialization/deserialization handled by ROS2 DDS and standard BSD sockets. Quality of service is optimized for best-effort delivery with message losses below 3%, mirroring real-world latency tolerances (mean V2I RTT $\approx$ 38 ms).

3. Perception Pipelines and Safety Metric Computation

Two parallel perception pipelines yield identical safety metrics for detected objects:

Pipeline 1: (RAFT-Stereo + YOLOv8-n + ByteTrack)
- Stereo disparity and depth calculation: $Z(u,v) = f \cdot B / d(u,v)$
- Kinematic estimation: $v_{obj}$ and quantized yaw angles via depth differentials
Pipeline 2: (ROI-Segmentation + EMA tracking)
- Cropped YOLO boxes are segmented via DeepLabV3-ResNet50, extracting mean depth within object masks (mm→m conversion).
- Maintains rolling window smoothing for depth and velocity; applies exponential moving average:
$EMA_{next} = \alpha \cdot new\_value + (1-\alpha) \cdot EMA_{prev},\quad \alpha=0.3$
Safety Metrics:
- Raw TTC:
$TTC = \begin{cases} d / |v_{rel}| & \text{if}\ v_{rel} < 0 \ \infty & \text{if}\ v_{rel} \geq 0 \end{cases}$ - THW:

$THW = d / v_{ego}$ - Both metrics are EMA-smoothed (in Pipeline 2) and transmitted as formatted strings in DetectedObj.msg.

4. Digital Twin Mirroring, Risk Assessment, and Alert Logic

Message flow is strictly event-driven at 20 Hz:

The on-board loop builds EgoPose.msg and DetectedObj.msg for all objects, publishing via UDP.
The digital twin listener receives/deserializes messages, updating UE actors for ego vehicle and detected objects. Objects not updated are flagged for removal.
The operator UI supports text-based SafetyAlert.msg transmission, which is routed to the on-board UI for display.

Real-time mirroring is achieved with:

Synchronized frame rates (up to 20 Hz)
Framewise updates of transforms, risk overlays (color-coded: green/safe, yellow/hazard, red/danger)
On-board UI shifts background and text per alert_level and SafetyAlert.msg contents.
Automatic hazard classification:
- safe: $TTC > 5$ s, $THW > 2$ s
- hazardous: $2$ s $< TTC \leq 5$ s
- dangerous: $TTC \leq 2$ s

5. Experimental Scenario Evaluation

Driving scenarios include:

Pedestrian Test: Controlled low-speed environment
Vehicle Following: Ego vehicle trails a lead car, inducing transitions across safety zones (safe/hazardous/dangerous)
Extended Urban: Implied by results, but not explicitly benchmarked

Key experimental findings:

Perception Performance: 20 fps end-to-end update rate
Detection Benchmarks (100-frame means):

Model	Latency [ms]	#boxes	Avg. conf.
YOLOv8-n	18.4 ± 3.5	2.2 ± 0.4	0.70 ± 0.05
Faster R-CNN	127.8 ± 3.7	33.1 ± 6.7	0.22 ± 0.03
SSDLite320	101.2 ± 12.1	300	0.07 ± 0.01

V2I Latency (4G→WSL2/fiber): min=21.95 ms, max=186.63 ms, mean=38.21 ms, $\sigma$ =28.18 ms, loss=2.795%
Safety Alert Validity: In vehicle-following tests, UI reliably shifted background (green→yellow→red) as TTC passed $5$ s and $2$ s thresholds; digital twin mirrored all risk states and operator messages.

6. Identified Limitations and Prospects for Future Work

Limitations in CD-TWINSAFE include:

GPS available only at 1 Hz; ZED-IMU drift yields pose lag and accumulated error
Stereo perception susceptible to low light, glare, shadow artifacts; degraded depth/false negatives at ranges $>\approx 15$ m

Future enhancements under consideration:

Sensor fusion: LiDAR/radar integration for robust perception
Localization: Upgrades to RTK-GPS, visual-inertial odometry
Teleoperation: Enabling direct remote control via digital twin interface

In summary, CD-TWINSAFE demonstrates an integrated, scalable framework that couples real-time perception, safety metric computation, and digital twin visualization using ROS2 and V2I networking. Achieving 20 Hz updates, sub-40 ms communication latency, and reproducible hazard classification, it provides a reference implementation for next-generation V2I safety and scene understanding pipelines (Khaled et al., 18 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

CD-TWINSAFE: A ROS-enabled Digital Twin for Scene Understanding and Safety Emerging V2I Technology (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CD-TWINSAFE.