Muon-NSR: Nuclear and Deep Learning Insights
- Muon-NSR is a dual-focused topic encompassing nuclear physics measurements of muon-induced neutron spallation and innovative deep learning optimization techniques.
- In the nuclear domain, it involves precise quantification of neutron yields, radionuclide activation, and detailed experimental calibration using layered detection systems.
- In deep learning, the Muon-NSR optimizer adapts momentum normalization through noise-to-signal ratio modulation, achieving faster convergence and reduced validation loss.
Muon-NSR refers to two disjoint technical domains unified by their connection to muons: (1) nuclear measurements and databases for muon-induced reactions, including the neutron spallation rate (NSR) and its application in underground and accelerator-based experiments; (2) a recent family of optimization algorithms in deep learning, specifically “Muon-NSR,” which leverages the noise-to-signal ratio (NSR) as a modulation strategy for momentum normalization during LLM pretraining. The following entry systematically describes both aspects, as each is an active research topic under the “Muon-NSR” designation.
1. Definition and Contexts of Muon-NSR
In nuclear and particle physics, NSR (Neutron Spallation Rate) quantifies the production of neutrons by cosmic or beam muons interacting with various materials, a central concern in low-background experiments, neutrino detectors, and radiation shielding. Muon-NSR is also the name of an optimizer variant for large-scale neural network pretraining that modulates orthogonal momentum updates using a variance-adaptive normalization scheme based on the local noise-to-signal ratio (Li et al., 21 Jan 2026).
2. Nuclear Muon-NSR: Quantitative Measurement and Yield Definition
Muon-induced neutron spallation is measured by the neutron yield, defined in underground and accelerator-based experiments as the number of muon-induced neutrons produced per muon per areal mass thickness of the target. Conventionally, the neutron yield is given by:
where is the total number of muon-induced neutrons, is the number of muons traversing the target, is the target density (), and is the cumulative muon track length through the target () (Collaboration et al., 2011).
For above-ground configurations, such as the ISMRAN detector, the neutron yield is determined similarly:
where is the mean areal mass traversed by muons (Dey et al., 23 Mar 2025).
Recent measurements from the ISMRAN collaboration provide an explicit value:
at sea level for composite shielding (10 cm Pb + 10 cm borated polyethylene) (Dey et al., 23 Mar 2025).
3. Detection and Tagging of Muon-Induced Neutrons
Precision experiments such as Borexino employ layered detection systems optimized for muon and cosmogenic neutron identification. Their muon tagging system consists of an inner liquid-scintillator detector surrounded by a water-Cherenkov outer detector (Collaboration et al., 2011). Tagging efficiency is evaluated via hardware and software triggers, pulse-shape discrimination, and position/time clustering:
- Combined veto efficiency: 99.992%
- Neutron gate: 1.6 ms DAQ window after each tagged muon.
- Neutron capture efficiency: 99%, with accidental backgrounds 1% (with fit s) (Collaboration et al., 2011).
Track reconstruction employs time/charge clustering for entry and exit points (via both OD and ID), with global 3D linear fits achieving angular resolutions of – and lateral resolutions of $35$–$50$ cm (Collaboration et al., 2011).
4. Muon-Induced Spallation and Radioactivity: Activation, Cross-Sections, and Data Structures
Muon-induced neutron and radionuclide production are quantified via direct counting and Monte Carlo supported analyses. Activation yields depend on:
- Muon flux, energy spectrum, and path length
- Target composition and areal mass
- Energy-dependent spallation cross-section
In the NuMI experiment, radionuclide yields in copper and aluminum targets—dominated by photo-nuclear and spallation processes—are compared to MARS simulations. The production rate is given by:
with measured yields in the to radionuclides per muon range for typical exposures (Boehnlein, 2012).
Efforts in nuclear data curation are formalized in the Muon-NSR database proposal, which includes:
- Muonic X-ray energies and intensities
- Muonic atom lifetimes (vacuum and capture components)
- Branching ratios of residual nuclei for various capture/spallation channels (e.g., , , )
- Emission probabilities for neutrons, protons, alphas, and -rays
- Emission spectra (parameters for Maxwellian and power-law components)
Table: Muonic Atom Data Schema (proposed in (Niikura et al., 2024))
| Table | Key Fields |
|---|---|
| tbl_Xrays | id, Z, A, nuclide, transition, [keV], |
| tbl_Lifetimes | id, Z, A, nuclide, [s], model, reference |
| tbl_Branching | id, Z, A, nuclide, channel, residual, BR, method, reference |
| tbl_Emissions | id, Z, A, nuclide, particle, multiplicity, probability, reference |
| tbl_Spectra | id, Z, A, nuclide, particle, distribution_type, parameters, reference |
5. Muon-NSR in Deep Learning Optimization
The Muon-NSR optimizer is a matrix-based modification of the Muon method. It leverages the noise-to-signal ratio to downregulate momentum updates that exhibit high variance, thus accelerating convergence and reducing validation loss in LLM pretraining (Li et al., 21 Jan 2026).
For a weight matrix , Muon-NSR maintains momentum and a variance surrogate as exponential moving averages. The per-coordinate NSR is
where is a sensitivity hyperparameter. The normalized momentum is
$\widetilde M_t[i,j] = \frac{M_t[i,j]}{\sqrt{M_t[i,j]^2 + yI_t[i,j}} + \epsilon}$
After normalization, Muon-NSR applies orthogonalization using Newton–Schulz iterations to approximate the matrix sign function (polar factor):
with initialization , .
Empirically, Muon-NSR yields 1.36×–1.5× reductions in iteration count to reach target validation loss compared to baselines on GPT-2 and LLaMA pretraining (Li et al., 21 Jan 2026). Performance is unimodal in the sensitivity , and only one extra buffer is needed relative to Muon.
Table: Validation Losses for LLaMA Pretraining (Suite A, (Li et al., 21 Jan 2026))
| Model | AdamW | Muon | Muon-NSR | Muon-VS |
|---|---|---|---|---|
| Llama-210M | 3.0458 | 3.0418 | 3.0322 | 3.0330 |
| Llama-720M | 2.7879 | 2.7858 | 2.7806 | 2.7798 |
6. Systematics, Uncertainties, and Shielding Implications
Neutron yield and activation measurements are systematics-limited, primarily due to:
- Energy threshold determination
- Capture-time window selection
- Muon trigger and flux normalization
- Modeling of cascades in neutron capture (e.g., DICEBOX in ISMRAN)
Quadrature of individual uncertainty contributions yields total systematic errors of 6.4% in neutron yield for ISMRAN (Dey et al., 23 Mar 2025). Shielding and facility design must consider muon-induced activation; per-muon radionuclide yields of 10^{-12}10^{-13}\sim muon interactions per year), can generate radioactive atoms per kg per year, necessitating targeted shielding, the use of low-activation materials, and explicit limits on beam loss (Boehnlein, 2012).
7. Future Developments and Data Integration
Dedicated, open nuclear databases for muon-induced reactions are under active development. The Muon-NSR framework for nuclear data envisions robust, multi-table infrastructures incorporating precise energies, intensities, branching data, emission spectra, and full metadata, closely analogous to evaluated nuclear data files (e.g., ENSDF, TENDL). In deep learning, Muon-NSR and related NSR-modulated optimizers suggest that variance-adaptive strategies coupled with matrix-structured updates provide both theoretically justified and empirically effective methodology for LLM-scale training (Li et al., 21 Jan 2026).