MSE-R: Robust Statistics, Online Algorithms, and Regression

Updated 2 February 2026

In robust statistics, MSE-R quantifies the maximal mean squared error of M-estimators under shrinking contamination, revealing nuanced bias-variance interactions.
In online algorithms, MSE-R employs multiscale entropic regularization on hierarchical DAG flows to attain competitive movement and service costs.
In regression, MSE-R defines a composite metric that balances squared error and Pearson correlation, ensuring precise predictions with strong population agreement.

The acronym "MSE-R" denotes distinct concepts in different research contexts. In robust statistics, it refers to the maximal Mean Squared Error of M-estimators on shrinking contamination neighborhoods. In online algorithms for metrical task systems, it indicates Multiscale Entropic Regularization—a convex regularizer for flows over hierarchical DAGs encoding the metric. More recently, in regression and metric evaluation, "MSE–R" describes a composite metric balancing squared error (MSE) against linear correlation (Pearson-R). Each usage fundamentally addresses robustness, hierarchical structure, or joint goals of error minimization and agreement.

1. Robust MSE in Shrinking Neighborhoods for M-Estimators

The “MSE-R” expansion in robust statistics formalizes the maximal MSE of a one-dimensional location M-estimator $S_n$ under convex contamination balls $\mathcal Q_n(r)$ with radius shrinking at rate $r/\sqrt{n}$ around the ideal distribution $F$ (typically $\mathcal{N}(0, 1)$ , or $L_2$ -differentiable location family). For $S_n$ with monotone, bounded influence curve $\psi$ , the expansion is

$R_n(S_n, r) = r^2 b^2 + v_0^2 + \frac{r}{\sqrt n} A_1 + \frac{1}{n} A_2 + o(n^{-1}),$

where $b = \sup_x |\psi(x)|$ is the maximum IC value, $v_0^2 = \mathbb{E}_F[\psi^2]$ is the ideal variance, and $A_1$ , $A_2$ are explicit polynomials in $b, r$ , and derivatives of $L(t)$ and $V(t)^2$ at $t = 0$ ( $L$ and $V^2$ are the shifted mean and variance of $\psi$ ).

This result holds over contamination neighborhoods where sample-wise thinning excludes events with more than $n/2$ contaminated points—an exponentially negligible adjustment. The coefficients $A_1$ , $A_2$ depend on higher-order Taylor expansions and moment-type quantities:

$l_j$ , $\tilde{v}_j$ : derivatives of $L(t)$ , $V(t)$
$\rho_0$ , $\rho_1$ : cubic skewness-type ratios
$\kappa_0$ : excess kurtosis Explicit formulas and interpretations demonstrate how bias and variance interact under contamination, how optimally chosen $\psi$ influences all higher-order corrections, and how the supremum is attained by concentrating contamination on extremal values of $\psi$ . Key technical tools include Edgeworth expansions for triangular arrays, saddle-point analysis, and breakdown-driven sample thinning (Ruckdeschel, 2010).

2. Multiscale Entropic Regularization in Online Algorithms

In metrical task systems (MTS), “MSE-R” refers to Multiscale Entropic Regularization—an entropic regularizer imposed on flows in a directed acyclic graph (DAG) constructed from the metric space $(X, d)$ . The DAG encodes a hierarchy via arcs $uv$ with length $\omega_{uv}$ and probability $\theta_{uv}$ , generating multiscale entropy terms: $R(F) = \frac{1}{\kappa} \sum_{uv \in A} \omega_{uv}\, \eta_{uv}\, (F_{uv} + F_u\,\delta_{uv}) \ln \left( \frac{F_{uv}/F_u + \delta_{uv}}{\delta_{uv}} \right)$ with $\eta_{uv} = 1 + \ln(1/\theta_{uv})$ , $\delta_{uv} = \theta_{uv}/\eta_{uv}$ , $F$ a root-normalized flow vector. Locally, entropy is decomposed at each internal DAG node.

This regularization permits a mirror-descent algorithm that directly exploits the natural hierarchy of $(X, d)$ , bypassing random ultrametric embeddings used in prior work. The resulting method attains $O((\log n)^2)$ -competitive movement cost and 1-competitiveness on service cost, matching the best previously known ultrametric-based approaches. The analysis leverages Bregman divergences, expanding and Lipschitz DAG properties, and telescopes service cost against divergence reductions (Ebrahimnejad et al., 2021).

3. Composite Metrics Balancing Error and Correlation: MSE–R in Regression

An alternative “MSE–R” metric has been proposed for regression evaluation to simultaneously penalize prediction error (via MSE) and reward linear agreement (via Pearson-R): $\mathrm{MSE\!-\!R}(p, y) = \mathrm{MSE}(p, y)\, [1 - R(p, y)]$ where $\mathrm{MSE}(p, y) = N^{-1} \sum_i (p_i - y_i)^2$ , and $R(p, y)$ is the Pearson correlation coefficient computed over predictions $p$ and gold standards $y$ .

This metric ensures that both a low MSE and high correlation are required for optimal score. If correlation is poor $(R \ll 1)$ or prediction error is high, the product remains large. The metric can be generalized as $MSE(1 - R^\gamma)$ for weight $\gamma > 0$ , or as $MSE - \alpha R$ for tunable $\alpha$ . The criterion fuses the objectives of minimizing absolute error and maximizing population-level agreement, which are not strictly aligned for arbitrary regression targets (Pandit et al., 2019).

4. Exact Mapping between MSE and Concordance/Linear Correlation

The algebraic link between MSE and concordance correlation coefficient ( $\rho_c$ ) is precise but non-monotonic: $\rho_c = \frac{2 \sigma_{XY}}{MSE + 2 \sigma_{XY}} = \left( 1 + \frac{MSE}{2 \sigma_{XY}} \right)^{-1}$ where $\sigma_{XY}$ is the covariance between prediction and ground truth. This mapping implies that minimizing MSE need not maximally increase $\rho_c$ ; for any fixed MSE there is an interval $[\psi(\delta), \Psi(\delta)]$ of possible $\rho_c$ values, depending on the error directionality relative to ground truth variance.

Explicit bounds: $\rho_{c,\max}(MSE) = \frac{2(1+\delta)}{1 + (1+\delta)^2}, \quad \rho_{c,\min}(MSE) = \frac{2(1-\delta)}{1 + (1-\delta)^2},\quad \delta = \sqrt{MSE/\sigma^2_G}$ Only errors distributed with or against the gold standard mean reach these extremes. This multi-to-multi mapping generates counterintuitive cases: $MSE_1 < MSE_2$ does not guarantee $\rho_{c,1} > \rho_{c,2}$ . A plausible implication is that optimizing only for MSE may be insufficient for maximizing concordance or correlation (Pandit et al., 2019).

5. Loss Function Extensions and Practical Implications

Loss functions combining MSE and joint statistics (covariance, dot-product, or correlation) have been proposed to optimize both prediction accuracy and agreement. Examples include:

$f_1 = \frac{MSE}{\sigma_{XY}}$
$f_3 = MSE - \alpha \sigma_{XY}$
$f_5 = |\sum_i \varepsilon_i (g_i - p_i)^2 - \sum_i \alpha_i (g_i p_i)^{2\beta_i+1}|^\gamma$ These forms explicitly penalize error magnitude and reward agreement, reflecting the underlying structure of metrics like MSE–R. Empirically, such objectives drive models towards solutions with both low average error and high linear alignment, as necessary for tasks demanding population-level reproducibility and fairness (Pandit et al., 2019).

6. Interpretations and Extensions

The term MSE-R, as encountered in recent literature, thus refers to advanced approaches for robust estimation under contamination (maximal MSE expansions for M-estimators), hierarchical regularization in online algorithms (multiscale entropy for task systems), and composite metrics for regression quality (balancing error and correlation). In each instance, key innovations address either higher-order bias-variance expansions, efficient hierarchical task allocation, or combined goals in regression and metric learning.

A plausible implication is that future work may refine these methodologies for tighter competitive ratios (as conjectured, $O(\log n)$ for MTS with entropic regularization), for broader applicability (e.g., $k$ -server problems), or for more nuanced metric objectives in regression modeling. These approaches underscore the necessity of multidimensional evaluation and robustness in both statistical estimation and machine learning.