Papers
Topics
Authors
Recent
Search
2000 character limit reached

Muon-NSR: Nuclear and Deep Learning Insights

Updated 28 January 2026
  • Muon-NSR is a dual-focused topic encompassing nuclear physics measurements of muon-induced neutron spallation and innovative deep learning optimization techniques.
  • In the nuclear domain, it involves precise quantification of neutron yields, radionuclide activation, and detailed experimental calibration using layered detection systems.
  • In deep learning, the Muon-NSR optimizer adapts momentum normalization through noise-to-signal ratio modulation, achieving faster convergence and reduced validation loss.

Muon-NSR refers to two disjoint technical domains unified by their connection to muons: (1) nuclear measurements and databases for muon-induced reactions, including the neutron spallation rate (NSR) and its application in underground and accelerator-based experiments; (2) a recent family of optimization algorithms in deep learning, specifically “Muon-NSR,” which leverages the noise-to-signal ratio (NSR) as a modulation strategy for momentum normalization during LLM pretraining. The following entry systematically describes both aspects, as each is an active research topic under the “Muon-NSR” designation.

1. Definition and Contexts of Muon-NSR

In nuclear and particle physics, NSR (Neutron Spallation Rate) quantifies the production of neutrons by cosmic or beam muons interacting with various materials, a central concern in low-background experiments, neutrino detectors, and radiation shielding. Muon-NSR is also the name of an optimizer variant for large-scale neural network pretraining that modulates orthogonal momentum updates using a variance-adaptive normalization scheme based on the local noise-to-signal ratio (Li et al., 21 Jan 2026).

2. Nuclear Muon-NSR: Quantitative Measurement and Yield Definition

Muon-induced neutron spallation is measured by the neutron yield, defined in underground and accelerator-based experiments as the number of muon-induced neutrons produced per muon per areal mass thickness of the target. Conventionally, the neutron yield YnY_n is given by:

Yn=NnNμρLY_n = \frac{N_n}{N_\mu \cdot \rho \cdot L}

where NnN_n is the total number of muon-induced neutrons, NμN_\mu is the number of muons traversing the target, ρ\rho is the target density (g/cm3\mathrm{g/cm^3}), and LL is the cumulative muon track length through the target (cm\mathrm{cm}) (Collaboration et al., 2011).

For above-ground configurations, such as the ISMRAN detector, the neutron yield is determined similarly:

Y=NnNμ×XY = \frac{N_{n}}{N_\mu \times X}

where X=ρavgLavgX = \rho_{\text{avg}} L_{\text{avg}} is the mean areal mass traversed by muons (Dey et al., 23 Mar 2025).

Recent measurements from the ISMRAN collaboration provide an explicit value:

Y=(2.81±0.14stat±0.18sys)×105  n/μ/(g/cm2)Y = (2.81 \pm 0.14_{\mathrm{stat}} \pm 0.18_{\mathrm{sys}}) \times 10^{-5} \;\mathrm{n/\mu/(g/cm^{2})}

at sea level for composite shielding (10 cm Pb + 10 cm borated polyethylene) (Dey et al., 23 Mar 2025).

3. Detection and Tagging of Muon-Induced Neutrons

Precision experiments such as Borexino employ layered detection systems optimized for muon and cosmogenic neutron identification. Their muon tagging system consists of an inner liquid-scintillator detector surrounded by a water-Cherenkov outer detector (Collaboration et al., 2011). Tagging efficiency is evaluated via hardware and software triggers, pulse-shape discrimination, and position/time clustering:

  • Combined veto efficiency: \geq 99.992%
  • Neutron gate: \sim1.6 ms DAQ window after each tagged muon.
  • Neutron capture efficiency: \gtrsim99%, with accidental backgrounds \lesssim1% (with fit τcap=254.5±1.8μ\tau_{\rm cap} = 254.5 \pm 1.8\,\mus) (Collaboration et al., 2011).

Track reconstruction employs time/charge clustering for entry and exit points (via both OD and ID), with global 3D linear fits achieving angular resolutions of 33^\circ55^\circ and lateral resolutions of $35$–$50$ cm (Collaboration et al., 2011).

4. Muon-Induced Spallation and Radioactivity: Activation, Cross-Sections, and Data Structures

Muon-induced neutron and radionuclide production are quantified via direct counting and Monte Carlo supported analyses. Activation yields depend on:

  • Muon flux, energy spectrum, and path length
  • Target composition and areal mass
  • Energy-dependent spallation cross-section σi(E)\sigma_i(E)

In the NuMI experiment, radionuclide yields in copper and aluminum targets—dominated by photo-nuclear and spallation processes—are compared to MARS simulations. The production rate is given by:

Ri=0Φμ(E)NTσi(E)dER_i = \int_0^\infty \Phi_\mu(E) N_T \sigma_i(E) dE

with measured yields in the 101210^{-12} to 101310^{-13} radionuclides per muon range for typical exposures (Boehnlein, 2012).

Efforts in nuclear data curation are formalized in the Muon-NSR database proposal, which includes:

  • Muonic X-ray energies and intensities
  • Muonic atom lifetimes (vacuum and capture components)
  • Branching ratios of residual nuclei for various capture/spallation channels (e.g., (μ,n)(\mu, n), (μ,2n)(\mu, 2n), (μ,p)(\mu, p))
  • Emission probabilities for neutrons, protons, alphas, and γ\gamma-rays
  • Emission spectra (parameters for Maxwellian and power-law components)

Table: Muonic Atom Data Schema (proposed in (Niikura et al., 2024))

Table Key Fields
tbl_Xrays id, Z, A, nuclide, transition, EγE_\gamma [keV], IrelI_{\rm rel}
tbl_Lifetimes id, Z, A, nuclide, τtot\tau_{\rm tot} [s], model, reference
tbl_Branching id, Z, A, nuclide, channel, residual, BR, method, reference
tbl_Emissions id, Z, A, nuclide, particle, multiplicity, probability, reference
tbl_Spectra id, Z, A, nuclide, particle, distribution_type, parameters, reference

5. Muon-NSR in Deep Learning Optimization

The Muon-NSR optimizer is a matrix-based modification of the Muon method. It leverages the noise-to-signal ratio to downregulate momentum updates that exhibit high variance, thus accelerating convergence and reducing validation loss in LLM pretraining (Li et al., 21 Jan 2026).

For a weight matrix WRm×nW \in \mathbb{R}^{m \times n}, Muon-NSR maintains momentum MtM_t and a variance surrogate ItI_t as exponential moving averages. The per-coordinate NSR is

NSRt[i,j]=yIt[i,j]Mt[i,j]\text{NSR}_t[i,j] = \frac{\sqrt{y \cdot I_t[i,j]}}{|M_t[i,j]|}

where y0y \geq 0 is a sensitivity hyperparameter. The normalized momentum is

$\widetilde M_t[i,j] = \frac{M_t[i,j]}{\sqrt{M_t[i,j]^2 + yI_t[i,j}} + \epsilon}$

After normalization, Muon-NSR applies orthogonalization using KK Newton–Schulz iterations to approximate the matrix sign function (polar factor):

Yk+1=12Yk(3IZkYk),Zk+1=12(3IZkYk)ZkY_{k+1} = \frac{1}{2} Y_k (3I - Z_k Y_k), \quad Z_{k+1} = \frac{1}{2} (3I - Z_k Y_k) Z_k

with initialization Y0=M~t/M~tFY_0 = \widetilde M_t / \| \widetilde M_t \|_F, Z0=IZ_0 = I.

Empirically, Muon-NSR yields 1.36×–1.5× reductions in iteration count to reach target validation loss compared to baselines on GPT-2 and LLaMA pretraining (Li et al., 21 Jan 2026). Performance is unimodal in the sensitivity yy, and only one extra buffer is needed relative to Muon.

Table: Validation Losses for LLaMA Pretraining (Suite A, (Li et al., 21 Jan 2026))

Model AdamW Muon Muon-NSR Muon-VS
Llama-210M 3.0458 3.0418 3.0322 3.0330
Llama-720M 2.7879 2.7858 2.7806 2.7798

6. Systematics, Uncertainties, and Shielding Implications

Neutron yield and activation measurements are systematics-limited, primarily due to:

  • Energy threshold determination
  • Capture-time window selection
  • Muon trigger and flux normalization
  • Modeling of γ\gamma cascades in neutron capture (e.g., DICEBOX in ISMRAN)

Quadrature of individual uncertainty contributions yields total systematic errors of \sim6.4% in neutron yield for ISMRAN (Dey et al., 23 Mar 2025). Shielding and facility design must consider muon-induced activation; per-muon radionuclide yields of \sim10^{-12}10^{-13},whenextrapolatedtofuturemuonfacilities(with, when extrapolated to future muon facilities (with \sim101910^{19} muon interactions per year), can generate O(107)\mathcal{O}(10^{7}) radioactive atoms per kg per year, necessitating targeted shielding, the use of low-activation materials, and explicit limits on beam loss (Boehnlein, 2012).

7. Future Developments and Data Integration

Dedicated, open nuclear databases for muon-induced reactions are under active development. The Muon-NSR framework for nuclear data envisions robust, multi-table infrastructures incorporating precise energies, intensities, branching data, emission spectra, and full metadata, closely analogous to evaluated nuclear data files (e.g., ENSDF, TENDL). In deep learning, Muon-NSR and related NSR-modulated optimizers suggest that variance-adaptive strategies coupled with matrix-structured updates provide both theoretically justified and empirically effective methodology for LLM-scale training (Li et al., 21 Jan 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Muon-NSR.