CD-TWINSAFE: V2I Digital Twin for Autonomous Safety
- CD-TWINSAFE is a vehicle-to-infrastructure digital twin architecture that mirrors on-board sensor data and high-fidelity UE5 simulations to assess scene safety.
- It couples dual perception pipelines—using YOLOv8-n, RAFT-Stereo, and ROI-segmentation—with continuous safety metric computations like TTC and THW at 20 Hz.
- The platform leverages ROS2 and UDP-based V2I networking to achieve sub-40 ms latency, enabling synchronized risk classification and operator-triggered safety alerts.
CD-TWINSAFE is a vehicle-to-infrastructure (V2I) enabled digital twin architecture designed to support scene understanding and real-time safety assessment for autonomous vehicles. The system tightly couples on-board sensor-driven perception and localization modules with a high-fidelity Unreal Engine 5 (UE5) simulation, interconnected via ROS 2 and UDP-based links over 4G. The infrastructure-side digital twin operates in synchrony with the vehicle, enabling live mirroring, risk classification, and operator-injected safety alerts within a collaborative ROS environment (Khaled et al., 18 Jan 2026).
1. Architectural Components
CD-TWINSAFE consists of two fully concurrent processing stacks:
On-Board Driving Stack:
- Localization Module: Utilizes vehicle throttle and gear state to estimate the longitudinal velocity at each 20 Hz frame. The outputs are egovehicle velocity and pose, used for transforming perception outputs into the world reference frame.
- Perception Module: Implements 20 fps inference (stereo camera ZED 2, 672×376 px @10 Hz neural-depth, 119.89 mm baseline), divided into multiple stages:
- Object Detection & Classification: Employs YOLOv8-n for detection and ByteTrack for multi-object tracking, producing 2D bounding boxes and class labels ("Car", "Pedestrian").
- Depth Estimation: RAFT-Stereo computes disparity , enabling depth .
- Kinematic Estimation: Object velocities and yaw angle quantization are derived via:
Yaw is estimated from vertical and horizontal depth differences (), discretized into . 4. Safety Features Extraction: Calculation of time-to-collision (TTC) and time-headway (THW) metrics.
Digital-Twin Stack:
- Unreal Engine 5-based live 3D replication.
- Spawner Actors: Dynamically add/update UE actors from incoming ROS2 messages (ego vehicle modeled as Pawn for operator POV, UI for alerts).
- Coordinate Transformation: Applies the Z–X–Y rotation matrix and offset equation:
- Visualization Pipeline: Parses UDP-received byte-encoded ROS2 messages, updating the UE scene at ~20 Hz.
- Digital-Twin UI: Displays detected object meshes and ego state; supports operator-driven alerts back to vehicle.
2. ROS2 Messaging and V2I Networking
The CD-TWINSAFE platform utilizes custom ROS 2 message definitions for inter-stack communication over UDP sockets:
- EgoPose.msg: Contains position (lat, lon, alt), orientation (roll, pitch, yaw), and speed of the ego vehicle.
- DetectedObj.msg: Encodes object track ID (ByteTrack), class label, relative coordinates, yaw, relative/absolute speed, distance, TTC, and THW.
- SafetyAlert.msg: Transports operator-typed alerts with relevant ID, text, and risk level (safe, hazardous, dangerous).
All message fields follow semantic conventions:
- Time synchronization via ROS
header.stamp - Relative positions in meters (
x_rel, y_rel, z_rel) - Safety metrics in seconds (
ttc, thw) alert_levelvalues driving visual UI cues
V2I communication leverages outbound UDP over a 4G modem (behind CG-NAT), with message serialization/deserialization handled by ROS2 DDS and standard BSD sockets. Quality of service is optimized for best-effort delivery with message losses below 3%, mirroring real-world latency tolerances (mean V2I RTT 38 ms).
3. Perception Pipelines and Safety Metric Computation
Two parallel perception pipelines yield identical safety metrics for detected objects:
- Pipeline 1: (RAFT-Stereo + YOLOv8-n + ByteTrack)
- Stereo disparity and depth calculation:
- Kinematic estimation: and quantized yaw angles via depth differentials
- Pipeline 2: (ROI-Segmentation + EMA tracking)
- Cropped YOLO boxes are segmented via DeepLabV3-ResNet50, extracting mean depth within object masks (mm→m conversion).
- Maintains rolling window smoothing for depth and velocity; applies exponential moving average:
Safety Metrics:
- Raw TTC:
- THW:
- Both metrics are EMA-smoothed (in Pipeline 2) and transmitted as formatted strings in DetectedObj.msg.
4. Digital Twin Mirroring, Risk Assessment, and Alert Logic
Message flow is strictly event-driven at 20 Hz:
The on-board loop builds EgoPose.msg and DetectedObj.msg for all objects, publishing via UDP.
The digital twin listener receives/deserializes messages, updating UE actors for ego vehicle and detected objects. Objects not updated are flagged for removal.
The operator UI supports text-based SafetyAlert.msg transmission, which is routed to the on-board UI for display.
Real-time mirroring is achieved with:
Synchronized frame rates (up to 20 Hz)
Framewise updates of transforms, risk overlays (color-coded: green/safe, yellow/hazard, red/danger)
On-board UI shifts background and text per alert_level and SafetyAlert.msg contents.
Automatic hazard classification:
- safe: s, s
- hazardous: $2$ s s
- dangerous: s
5. Experimental Scenario Evaluation
Driving scenarios include:
- Pedestrian Test: Controlled low-speed environment
- Vehicle Following: Ego vehicle trails a lead car, inducing transitions across safety zones (safe/hazardous/dangerous)
- Extended Urban: Implied by results, but not explicitly benchmarked
Key experimental findings:
- Perception Performance: 20 fps end-to-end update rate
- Detection Benchmarks (100-frame means):
| Model | Latency [ms] | #boxes | Avg. conf. |
|---|---|---|---|
| YOLOv8-n | 18.4 ± 3.5 | 2.2 ± 0.4 | 0.70 ± 0.05 |
| Faster R-CNN | 127.8 ± 3.7 | 33.1 ± 6.7 | 0.22 ± 0.03 |
| SSDLite320 | 101.2 ± 12.1 | 300 | 0.07 ± 0.01 |
- V2I Latency (4G→WSL2/fiber): min=21.95 ms, max=186.63 ms, mean=38.21 ms, =28.18 ms, loss=2.795%
- Safety Alert Validity: In vehicle-following tests, UI reliably shifted background (green→yellow→red) as TTC passed $5$ s and $2$ s thresholds; digital twin mirrored all risk states and operator messages.
6. Identified Limitations and Prospects for Future Work
Limitations in CD-TWINSAFE include:
- GPS available only at 1 Hz; ZED-IMU drift yields pose lag and accumulated error
- Stereo perception susceptible to low light, glare, shadow artifacts; degraded depth/false negatives at ranges m
Future enhancements under consideration:
- Sensor fusion: LiDAR/radar integration for robust perception
- Localization: Upgrades to RTK-GPS, visual-inertial odometry
- Teleoperation: Enabling direct remote control via digital twin interface
In summary, CD-TWINSAFE demonstrates an integrated, scalable framework that couples real-time perception, safety metric computation, and digital twin visualization using ROS2 and V2I networking. Achieving 20 Hz updates, sub-40 ms communication latency, and reproducible hazard classification, it provides a reference implementation for next-generation V2I safety and scene understanding pipelines (Khaled et al., 18 Jan 2026).