Muon-NSR: Nuclear and Deep Learning Insights

Updated 28 January 2026

Muon-NSR is a dual-focused topic encompassing nuclear physics measurements of muon-induced neutron spallation and innovative deep learning optimization techniques.
In the nuclear domain, it involves precise quantification of neutron yields, radionuclide activation, and detailed experimental calibration using layered detection systems.
In deep learning, the Muon-NSR optimizer adapts momentum normalization through noise-to-signal ratio modulation, achieving faster convergence and reduced validation loss.

Muon-NSR refers to two disjoint technical domains unified by their connection to muons: (1) nuclear measurements and databases for muon-induced reactions, including the neutron spallation rate (NSR) and its application in underground and accelerator-based experiments; (2) a recent family of optimization algorithms in deep learning, specifically “Muon-NSR,” which leverages the noise-to-signal ratio (NSR) as a modulation strategy for momentum normalization during LLM pretraining. The following entry systematically describes both aspects, as each is an active research topic under the “Muon-NSR” designation.

1. Definition and Contexts of Muon-NSR

In nuclear and particle physics, NSR (Neutron Spallation Rate) quantifies the production of neutrons by cosmic or beam muons interacting with various materials, a central concern in low-background experiments, neutrino detectors, and radiation shielding. Muon-NSR is also the name of an optimizer variant for large-scale neural network pretraining that modulates orthogonal momentum updates using a variance-adaptive normalization scheme based on the local noise-to-signal ratio (Li et al., 21 Jan 2026).

2. Nuclear Muon-NSR: Quantitative Measurement and Yield Definition

Muon-induced neutron spallation is measured by the neutron yield, defined in underground and accelerator-based experiments as the number of muon-induced neutrons produced per muon per areal mass thickness of the target. Conventionally, the neutron yield $Y_n$ is given by:

$Y_n = \frac{N_n}{N_\mu \cdot \rho \cdot L}$

where $N_n$ is the total number of muon-induced neutrons, $N_\mu$ is the number of muons traversing the target, $\rho$ is the target density ( $\mathrm{g/cm^3}$ ), and $L$ is the cumulative muon track length through the target ( $\mathrm{cm}$ ) (Collaboration et al., 2011).

For above-ground configurations, such as the ISMRAN detector, the neutron yield is determined similarly:

$Y = \frac{N_{n}}{N_\mu \times X}$

where $X = \rho_{\text{avg}} L_{\text{avg}}$ is the mean areal mass traversed by muons (Dey et al., 23 Mar 2025).

Recent measurements from the ISMRAN collaboration provide an explicit value:

$Y = (2.81 \pm 0.14_{\mathrm{stat}} \pm 0.18_{\mathrm{sys}}) \times 10^{-5} \;\mathrm{n/\mu/(g/cm^{2})}$

at sea level for composite shielding (10 cm Pb + 10 cm borated polyethylene) (Dey et al., 23 Mar 2025).

3. Detection and Tagging of Muon-Induced Neutrons

Precision experiments such as Borexino employ layered detection systems optimized for muon and cosmogenic neutron identification. Their muon tagging system consists of an inner liquid-scintillator detector surrounded by a water-Cherenkov outer detector (Collaboration et al., 2011). Tagging efficiency is evaluated via hardware and software triggers, pulse-shape discrimination, and position/time clustering:

Combined veto efficiency: $\geq$ 99.992%
Neutron gate: $\sim$ 1.6 ms DAQ window after each tagged muon.
Neutron capture efficiency: $\gtrsim$ 99%, with accidental backgrounds $\lesssim$ 1% (with fit $\tau_{\rm cap} = 254.5 \pm 1.8\,\mu$ s) (Collaboration et al., 2011).

Track reconstruction employs time/charge clustering for entry and exit points (via both OD and ID), with global 3D linear fits achieving angular resolutions of $3^\circ$ – $5^\circ$ and lateral resolutions of $35$–$50$ cm (Collaboration et al., 2011).

4. Muon-Induced Spallation and Radioactivity: Activation, Cross-Sections, and Data Structures

Muon-induced neutron and radionuclide production are quantified via direct counting and Monte Carlo supported analyses. Activation yields depend on:

Muon flux, energy spectrum, and path length
Target composition and areal mass
Energy-dependent spallation cross-section $\sigma_i(E)$

In the NuMI experiment, radionuclide yields in copper and aluminum targets—dominated by photo-nuclear and spallation processes—are compared to MARS simulations. The production rate is given by:

$R_i = \int_0^\infty \Phi_\mu(E) N_T \sigma_i(E) dE$

with measured yields in the $10^{-12}$ to $10^{-13}$ radionuclides per muon range for typical exposures (Boehnlein, 2012).

Efforts in nuclear data curation are formalized in the Muon-NSR database proposal, which includes:

Muonic X-ray energies and intensities
Muonic atom lifetimes (vacuum and capture components)
Branching ratios of residual nuclei for various capture/spallation channels (e.g., $(\mu, n)$ , $(\mu, 2n)$ , $(\mu, p)$ )
Emission probabilities for neutrons, protons, alphas, and $\gamma$ -rays
Emission spectra (parameters for Maxwellian and power-law components)

Table: Muonic Atom Data Schema (proposed in (Niikura et al., 2024))

Table	Key Fields
tbl_Xrays	id, Z, A, nuclide, transition, $E_\gamma$ [keV], $I_{\rm rel}$
tbl_Lifetimes	id, Z, A, nuclide, $\tau_{\rm tot}$ [s], model, reference
tbl_Branching	id, Z, A, nuclide, channel, residual, BR, method, reference
tbl_Emissions	id, Z, A, nuclide, particle, multiplicity, probability, reference
tbl_Spectra	id, Z, A, nuclide, particle, distribution_type, parameters, reference

5. Muon-NSR in Deep Learning Optimization

The Muon-NSR optimizer is a matrix-based modification of the Muon method. It leverages the noise-to-signal ratio to downregulate momentum updates that exhibit high variance, thus accelerating convergence and reducing validation loss in LLM pretraining (Li et al., 21 Jan 2026).

For a weight matrix $W \in \mathbb{R}^{m \times n}$ , Muon-NSR maintains momentum $M_t$ and a variance surrogate $I_t$ as exponential moving averages. The per-coordinate NSR is

$\text{NSR}_t[i,j] = \frac{\sqrt{y \cdot I_t[i,j]}}{|M_t[i,j]|}$

where $y \geq 0$ is a sensitivity hyperparameter. The normalized momentum is

$\widetilde M_t[i,j] = \frac{M_t[i,j]}{\sqrt{M_t[i,j]^2 + yI_t[i,j}} + \epsilon}$

After normalization, Muon-NSR applies orthogonalization using $K$ Newton–Schulz iterations to approximate the matrix sign function (polar factor):

$Y_{k+1} = \frac{1}{2} Y_k (3I - Z_k Y_k), \quad Z_{k+1} = \frac{1}{2} (3I - Z_k Y_k) Z_k$

with initialization $Y_0 = \widetilde M_t / \| \widetilde M_t \|_F$ , $Z_0 = I$ .

Empirically, Muon-NSR yields 1.36×–1.5× reductions in iteration count to reach target validation loss compared to baselines on GPT-2 and LLaMA pretraining (Li et al., 21 Jan 2026). Performance is unimodal in the sensitivity $y$ , and only one extra buffer is needed relative to Muon.

Table: Validation Losses for LLaMA Pretraining (Suite A, (Li et al., 21 Jan 2026))

Model	AdamW	Muon	Muon-NSR	Muon-VS
Llama-210M	3.0458	3.0418	3.0322	3.0330
Llama-720M	2.7879	2.7858	2.7806	2.7798

6. Systematics, Uncertainties, and Shielding Implications

Neutron yield and activation measurements are systematics-limited, primarily due to:

Energy threshold determination
Capture-time window selection
Muon trigger and flux normalization
Modeling of $\gamma$ cascades in neutron capture (e.g., DICEBOX in ISMRAN)

Quadrature of individual uncertainty contributions yields total systematic errors of $\sim$ 6.4% in neutron yield for ISMRAN (Dey et al., 23 Mar 2025). Shielding and facility design must consider muon-induced activation; per-muon radionuclide yields of $\sim$ 10^{-12} $–$ 10^{-13} $, when extrapolated to future muon facilities (with$ \sim $10^{19}$ muon interactions per year), can generate $\mathcal{O}(10^{7})$ radioactive atoms per kg per year, necessitating targeted shielding, the use of low-activation materials, and explicit limits on beam loss (Boehnlein, 2012).

7. Future Developments and Data Integration

Dedicated, open nuclear databases for muon-induced reactions are under active development. The Muon-NSR framework for nuclear data envisions robust, multi-table infrastructures incorporating precise energies, intensities, branching data, emission spectra, and full metadata, closely analogous to evaluated nuclear data files (e.g., ENSDF, TENDL). In deep learning, Muon-NSR and related NSR-modulated optimizers suggest that variance-adaptive strategies coupled with matrix-structured updates provide both theoretically justified and empirically effective methodology for LLM-scale training (Li et al., 21 Jan 2026).