Papers
Topics
Authors
Recent
Search
2000 character limit reached

Safe-NEureka: High-Reliability Extreme Systems

Updated 5 February 2026
  • Safe-NEureka is a comprehensive engineering framework that integrates quantitative risk assessment, modular redundancy, and ALARA-compliant safety practices for both high-energy accelerators and satellite AI systems.
  • The methodology employs hybrid modular redundancy, SEC-DED ECC memory protection, and TMR controllers to ensure fault tolerance and reliable performance under extreme thermal, mechanical, and radiation stresses.
  • The framework also extends to chemical and environmental safety by using high-flash-point solvents and robust shielding, achieving significant reductions in faults, exposure risks, and operational overhead.

Safe-NEureka refers both to a class of engineering practices for high-safety, high-reliability systems operating under extreme environments (notably multi-MW neutrino beamlines and large-volume particle detectors), and to a radiation-tolerant DNN accelerator architecture for on-board satellite AI. The term embodies a design philosophy and implementation framework that mandates rigorous risk quantification, modular redundancy, robust hardware/software co-design, and environmental hazard mitigation, with formal compliance to defined safety metrics across operational domains (Baussan et al., 2011, Bonhomme et al., 2022, Tedeschi et al., 4 Feb 2026).

1. General Principles and Safety Objectives

Safe-NEureka architectures are governed by a principle of “no single point of failure” and the provision of quantitative engineering margins against all identified risk factors. Key safety and reliability goals, as established in multi-MW accelerator and satellite AI contexts, include:

  • Partitioning high-risk elements (e.g., 4 MW proton beam into four 1 MW targets) to reduce exposure per module (Baussan et al., 2011).
  • Maintaining thermal, mechanical, and electrical stresses below the endurance limits for multi-year, high-cycling operation (e.g., 10910^{9} beam pulses, Sf=20S_f=20 MPa for Al 6061-T6) (Baussan et al., 2011).
  • Ensuring all failure modes, from radiation-induced upsets to hardware wear-out, are mitigated either by redundancy (modular, DMR/TMR), robust materials design, or remote/intervention-free recovery protocols (Tedeschi et al., 4 Feb 2026).
  • ALARA-driven shielding, containment, and environmental protection for personnel and habitat, combining passive materials barriers and active air/ventilation control (Baussan et al., 2011, Bonhomme et al., 2022).

2. Redundancy, Fault Tolerance, and Recovery Architectures

Central to Safe-NEureka’s approach is modular redundancy at both system and sub-system levels:

  • Hybrid Modular Redundancy (HMR): Safe-NEureka accelerator IP splits a 4×4 processing-element (PE) array into two 4×2 sub-arrays. At run-time, this hardware can be switched between:
  • Memory Protection (SEC-DED ECC): All tightly coupled data memory and meta-data paths are guarded by Hsiao SEC-DED codes (RECC0.82R_{ECC}\approx0.82), effecting single-bit error correction and double-bit error detection on-the-fly (Tedeschi et al., 4 Feb 2026).
  • TMR Controller: Critical controller FSM and μ\muloop microcode are triplicated with majority voting; this constitutes a 240%240\% area overhead for the controller but only 2%2\% of total accelerator area (Tedeschi et al., 4 Feb 2026).
  • Recovery FSM: On-line detection/rollback for DMR triggers a microcode pointer revert and tile recomputation (latency bounded to O(102103)O(10^2{-}10^3) cycles, e.g., $90$—$330$ cycles for typical CNN tiles), decoupled from global system reboots.
  • Multi-level Redundancy in Neutrino Facilities: Parallel target/horn systems, cooling circuits, and power supplies. Critical activated components are handled remotely to avoid personnel exposure. ALARA principles further drive redundant barriers and fail-safe environmental controls (Baussan et al., 2011).

The quantified impact is a 96%96\% reduction in faulty executions for DMR with a manageable 15%15\% area overhead. In redundancy mode, Safe-NEureka exhibits a $70$–90%90\% latency increase and up to 53%53\% reduced efficiency (TOPS/W), but in performance mode, throughput and efficiency reductions are constrained to $5$–11%11\% (Tedeschi et al., 4 Feb 2026).

3. Thermo-Mechanical and Radiation Safety Analysis

Safe-NEureka frameworks employ multi-physics FEA and probabilistic fault models to establish robust operation:

  • Horn and Target Modules: Finite element analysis integrates electromagnetic (J×B) pulsed stresses (magnetic pressure pmagnetic=μ0I2/(2πr)2p_{magnetic}=\mu_0 I^2/(2\pi r)^2) with steady-state and transient thermal profiles (Q=mcΔTQ=mc\Delta T). Fatigue S–N curves for relevant alloys (e.g., Al 6061-T6) establish allowable stress amplitudes (Sf=20S_f=20 MPa for N=109N=10^9) (Baussan et al., 2011).
  • Beam Window and Target: For a 0.25 mm Be window under 1 MW beam, water or He cooling holds TmaxT_{max} at 180180^\circC/109109^\circC, well below beryllium strength limits (σVM,max=50/39\sigma_{VM,max}=50/39 MPa). Ti6Al4V packed-bed sphere targets with He cooling provide thermal-shock mitigation and facilitate remote handling (Baussan et al., 2011).
  • Radiation Shielding: Facility walls use $5.5$ m concrete; FLUKA models confirm negligible rock activation after 200 operational days. All highly activated equipment is accessed only by remote manipulators to minimize dose (Baussan et al., 2011).
  • Environmental Controls for Scintillators: Selection of high-flash-point (e.g., polysiloxane, LAB) and non-toxic solvents, with vapor pressures and flash points documented in strict compliance with GHS/EU safety standards (e.g., TPTMTS: flash point 230230^\circC, vapor pressure 5 ⁣× ⁣1065\!\times\!10^{-6} mbar, no H-statements) (Bonhomme et al., 2022).

4. Safe-NEureka in Chemical Systems and Environmental Safety

Safe-NEureka principles are extended to liquid scintillator systems via the adoption of advanced solvents and materials:

  • Scintillator Design: Classical solvents (toluene, xylene, pseudocumene) offer high light yield (LY), but with low flash points (<50<50^\circC) and high toxicity. Safe-NEureka-compliant solutions emphasize:
    • Linear alkylbenzene (LAB): flash point 140\sim140^\circC, vapor pressure $0.013$ mbar, GHS non-hazardous, attenuation length >20>20 m, LY 60.2%60.2\% Anthracene (Bonhomme et al., 2022).
    • Polysiloxane (TPTMTS): flash point 230230^\circC, non-volatile, non-toxic, attenuation length \sim5–10 m (unpurified), LY 58.4%58.4\% Anthracene.
  • Operational Recommendations: For kiloton-scale, transparency-dominated detectors, LAB with PPO and bis-MSB is favored. For safety-dominated contexts (reactor proximity, strict VOC caps), TPTMTS solutions are prioritized despite  20%~20\% lower LY and increased viscosity (Bonhomme et al., 2022).
  • Deployment Practices: Purification (e.g., Al2_2O3_3 column) extends attenuation, temperature is controlled to $15$–2525^\circC, and all operations are designed to minimize environmental and occupational hazard via remote and automated protocols.

5. Compliance, Lifetime, and Mixed-Criticality Use Cases

Safe-NEureka compliance spans both operational and lifecycle safety:

  • Fault Coverage: Fault-injection analyses demonstrate safe accelerator modes reduce undetected error rates by nearly two orders of magnitude. Controller-originated errors are nearly eliminated via TMR, with remaining failures traceable to non-triplicated logic eligible for further hardening (Tedeschi et al., 4 Feb 2026).
  • Mode Switching and Overhead: The architecture allows dynamic mode switches (DMR ↔ Performance) via a memory-mapped register, facilitating rapid adaptation to changing mission phases without job restarts or reconfiguration overheads <<400 cycles (Tedeschi et al., 4 Feb 2026).
  • Facility and Environmental Lifetime: Beamline modules and scintillator containment are engineered for 10910^9 operation cycles and multi-year lifetimes, with full remote handling and repair scenarios validated, ensuring ALARA, fail-safe, and environmental targets are met (Baussan et al., 2011, Bonhomme et al., 2022).
  • Mixed-Criticality Operation: Satellites process GNC kernels in redundancy mode (latency trade-off for correctness), while payload filters utilize performance mode (minimal throughput loss) (Tedeschi et al., 4 Feb 2026).

6. Summary of Quantitative Risk Mitigation

The following table summarizes salient risk mitigation parameters and their engineered responses:

Risk Domain Mitigation Strategy Quantitative Outcome
Radiation-induced faults (DNN) DMR, TMR, SEC-DED ECC 96% reduction in faults, area overhead 15%
Proton beam/horn overstress FEA-verified limits, cooling, fatigue σcombined20\sigma_{combined} \leq 20 MPa vs. Sf=20S_f=20 MPa
Chemical fire/exposure, LS detectors High FP, non-toxic solvents (LAB, TPTMTS) LAB FP 140\sim140^\circC, TPTMTS FP 230230^\circC
Environmental/occupational exposure ALARA shielding, remote handling Rock activation ≃ 0, minimal VOC/groundwater risk
Controller/hardware logic errors TMR-voted FSM, hardware rollback Nearly all controller errors recovered

7. Outlook and Future Prospects

Safe-NEureka frameworks, both in hardware and facility-scale contexts, provide a model for engineering rigorous, multi-modal safety in systems exposed to extreme environments and mixed-criticality computational workloads. Potential extensions include dynamic adaptation of redundancy and recovery strategies, integration with on-line system health monitoring, and further formalization of ALARA-compliant environmental controls in increasingly large-scale or autonomous systems (Baussan et al., 2011, Bonhomme et al., 2022, Tedeschi et al., 4 Feb 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Safe-NEureka.