Safe-NEureka: High-Reliability Extreme Systems

Updated 5 February 2026

Safe-NEureka is a comprehensive engineering framework that integrates quantitative risk assessment, modular redundancy, and ALARA-compliant safety practices for both high-energy accelerators and satellite AI systems.
The methodology employs hybrid modular redundancy, SEC-DED ECC memory protection, and TMR controllers to ensure fault tolerance and reliable performance under extreme thermal, mechanical, and radiation stresses.
The framework also extends to chemical and environmental safety by using high-flash-point solvents and robust shielding, achieving significant reductions in faults, exposure risks, and operational overhead.

Safe-NEureka refers both to a class of engineering practices for high-safety, high-reliability systems operating under extreme environments (notably multi-MW neutrino beamlines and large-volume particle detectors), and to a radiation-tolerant DNN accelerator architecture for on-board satellite AI. The term embodies a design philosophy and implementation framework that mandates rigorous risk quantification, modular redundancy, robust hardware/software co-design, and environmental hazard mitigation, with formal compliance to defined safety metrics across operational domains (Baussan et al., 2011, Bonhomme et al., 2022, Tedeschi et al., 4 Feb 2026).

1. General Principles and Safety Objectives

Safe-NEureka architectures are governed by a principle of “no single point of failure” and the provision of quantitative engineering margins against all identified risk factors. Key safety and reliability goals, as established in multi-MW accelerator and satellite AI contexts, include:

Partitioning high-risk elements (e.g., 4 MW proton beam into four 1 MW targets) to reduce exposure per module (Baussan et al., 2011).
Maintaining thermal, mechanical, and electrical stresses below the endurance limits for multi-year, high-cycling operation (e.g., $10^{9}$ beam pulses, $S_f=20$ MPa for Al 6061-T6) (Baussan et al., 2011).
Ensuring all failure modes, from radiation-induced upsets to hardware wear-out, are mitigated either by redundancy (modular, DMR/TMR), robust materials design, or remote/intervention-free recovery protocols (Tedeschi et al., 4 Feb 2026).
ALARA-driven shielding, containment, and environmental protection for personnel and habitat, combining passive materials barriers and active air/ventilation control (Baussan et al., 2011, Bonhomme et al., 2022).

2. Redundancy, Fault Tolerance, and Recovery Architectures

Central to Safe-NEureka’s approach is modular redundancy at both system and sub-system levels:

Hybrid Modular Redundancy (HMR): Safe-NEureka accelerator IP splits a 4×4 processing-element (PE) array into two 4×2 sub-arrays. At run-time, this hardware can be switched between:
- Dual Modular Redundancy (DMR) mode for safety-critical workloads, with online output comparison and hardware rollback on mismatch;
- Performance mode, operating both sub-arrays independently for throughput maximization (Tedeschi et al., 4 Feb 2026).
Memory Protection (SEC-DED ECC): All tightly coupled data memory and meta-data paths are guarded by Hsiao SEC-DED codes ( $R_{ECC}\approx0.82$ ), effecting single-bit error correction and double-bit error detection on-the-fly (Tedeschi et al., 4 Feb 2026).
TMR Controller: Critical controller FSM and $\mu$ loop microcode are triplicated with majority voting; this constitutes a $240\%$ area overhead for the controller but only $2\%$ of total accelerator area (Tedeschi et al., 4 Feb 2026).
Recovery FSM: On-line detection/rollback for DMR triggers a microcode pointer revert and tile recomputation (latency bounded to $O(10^2{-}10^3)$ cycles, e.g., $90$—$330$ cycles for typical CNN tiles), decoupled from global system reboots.
Multi-level Redundancy in Neutrino Facilities: Parallel target/horn systems, cooling circuits, and power supplies. Critical activated components are handled remotely to avoid personnel exposure. ALARA principles further drive redundant barriers and fail-safe environmental controls (Baussan et al., 2011).

The quantified impact is a $96\%$ reduction in faulty executions for DMR with a manageable $S_f=20$ 0 area overhead. In redundancy mode, Safe-NEureka exhibits a $S_f=20$ 1– $S_f=20$ 2 latency increase and up to $S_f=20$ 3 reduced efficiency (TOPS/W), but in performance mode, throughput and efficiency reductions are constrained to $S_f=20$ 4– $S_f=20$ 5 (Tedeschi et al., 4 Feb 2026).

3. Thermo-Mechanical and Radiation Safety Analysis

Safe-NEureka frameworks employ multi-physics FEA and probabilistic fault models to establish robust operation:

Horn and Target Modules: Finite element analysis integrates electromagnetic (J×B) pulsed stresses (magnetic pressure $S_f=20$ 6) with steady-state and transient thermal profiles ( $S_f=20$ 7). Fatigue S–N curves for relevant alloys (e.g., Al 6061-T6) establish allowable stress amplitudes ( $S_f=20$ 8 MPa for $S_f=20$ 9) (Baussan et al., 2011).
Beam Window and Target: For a 0.25 mm Be window under 1 MW beam, water or He cooling holds $R_{ECC}\approx0.82$ 0 at $R_{ECC}\approx0.82$ 1C/ $R_{ECC}\approx0.82$ 2C, well below beryllium strength limits ( $R_{ECC}\approx0.82$ 3 MPa). Ti6Al4V packed-bed sphere targets with He cooling provide thermal-shock mitigation and facilitate remote handling (Baussan et al., 2011).
Radiation Shielding: Facility walls use $R_{ECC}\approx0.82$ 4 m concrete; FLUKA models confirm negligible rock activation after 200 operational days. All highly activated equipment is accessed only by remote manipulators to minimize dose (Baussan et al., 2011).
Environmental Controls for Scintillators: Selection of high-flash-point (e.g., polysiloxane, LAB) and non-toxic solvents, with vapor pressures and flash points documented in strict compliance with GHS/EU safety standards (e.g., TPTMTS: flash point $R_{ECC}\approx0.82$ 5C, vapor pressure $R_{ECC}\approx0.82$ 6 mbar, no H-statements) (Bonhomme et al., 2022).

4. Safe-NEureka in Chemical Systems and Environmental Safety

Safe-NEureka principles are extended to liquid scintillator systems via the adoption of advanced solvents and materials:

Scintillator Design: Classical solvents (toluene, xylene, pseudocumene) offer high light yield (LY), but with low flash points ( $R_{ECC}\approx0.82$ $R_{ECC} \approx 0.82$ 7C) and high toxicity. Safe-NEureka-compliant solutions emphasize:
- Linear alkylbenzene (LAB): flash point $R_{ECC}\approx0.82$ 8C, vapor pressure $R_{ECC}\approx0.82$ 9 mbar, GHS non-hazardous, attenuation length $\mu$ 0 m, LY $\mu$ 1 Anthracene (Bonhomme et al., 2022).
- Polysiloxane (TPTMTS): flash point $\mu$ 2C, non-volatile, non-toxic, attenuation length $\mu$ 35–10 m (unpurified), LY $\mu$ 4 Anthracene.
Operational Recommendations: For kiloton-scale, transparency-dominated detectors, LAB with PPO and bis-MSB is favored. For safety-dominated contexts (reactor proximity, strict VOC caps), TPTMTS solutions are prioritized despite $\mu$ 5 lower LY and increased viscosity (Bonhomme et al., 2022).
Deployment Practices: Purification (e.g., Al $\mu$ 6O $\mu$ 7 column) extends attenuation, temperature is controlled to $\mu$ 8– $\mu$ 9C, and all operations are designed to minimize environmental and occupational hazard via remote and automated protocols.

5. Compliance, Lifetime, and Mixed-Criticality Use Cases

Safe-NEureka compliance spans both operational and lifecycle safety:

Fault Coverage: Fault-injection analyses demonstrate safe accelerator modes reduce undetected error rates by nearly two orders of magnitude. Controller-originated errors are nearly eliminated via TMR, with remaining failures traceable to non-triplicated logic eligible for further hardening (Tedeschi et al., 4 Feb 2026).
Mode Switching and Overhead: The architecture allows dynamic mode switches (DMR ↔ Performance) via a memory-mapped register, facilitating rapid adaptation to changing mission phases without job restarts or reconfiguration overheads $240\%$ 0400 cycles (Tedeschi et al., 4 Feb 2026).
Facility and Environmental Lifetime: Beamline modules and scintillator containment are engineered for $240\%$ 1 operation cycles and multi-year lifetimes, with full remote handling and repair scenarios validated, ensuring ALARA, fail-safe, and environmental targets are met (Baussan et al., 2011, Bonhomme et al., 2022).
Mixed-Criticality Operation: Satellites process GNC kernels in redundancy mode (latency trade-off for correctness), while payload filters utilize performance mode (minimal throughput loss) (Tedeschi et al., 4 Feb 2026).

6. Summary of Quantitative Risk Mitigation

The following table summarizes salient risk mitigation parameters and their engineered responses:

Risk Domain	Mitigation Strategy	Quantitative Outcome
Radiation-induced faults (DNN)	DMR, TMR, SEC-DED ECC	96% reduction in faults, area overhead 15%
Proton beam/horn overstress	FEA-verified limits, cooling, fatigue	$240\%$ 2 MPa vs. $240\%$ 3 MPa
Chemical fire/exposure, LS detectors	High FP, non-toxic solvents (LAB, TPTMTS)	LAB FP $240\%$ 4C, TPTMTS FP $240\%$ 5C
Environmental/occupational exposure	ALARA shielding, remote handling	Rock activation ≃ 0, minimal VOC/groundwater risk
Controller/hardware logic errors	TMR-voted FSM, hardware rollback	Nearly all controller errors recovered

7. Outlook and Future Prospects

Safe-NEureka frameworks, both in hardware and facility-scale contexts, provide a model for engineering rigorous, multi-modal safety in systems exposed to extreme environments and mixed-criticality computational workloads. Potential extensions include dynamic adaptation of redundancy and recovery strategies, integration with on-line system health monitoring, and further formalization of ALARA-compliant environmental controls in increasingly large-scale or autonomous systems (Baussan et al., 2011, Bonhomme et al., 2022, Tedeschi et al., 4 Feb 2026).