Robustness Rate (RR) in Deep Learning

Updated 5 February 2026

Robustness Rate (RR) is a metric that quantifies deep learning classifier stability by assessing changes in output probabilities under bounded input perturbations.
RR distinguishes itself from robust accuracy by evaluating both label consistency and the sensitivity of predicted confidence, offering a finer-grained measure of model robustness.
Its application in safety-critical domains reveals that even minor shifts in confidence can indicate vulnerability, emphasizing the need for precise calibration in model deployment.

Robustness Rate (RR), also known as Robustness Ratio, is a metric designed to quantify the stability of a deep learning classifier’s output probabilities under bounded input perturbations. Unlike traditional robust accuracy (RA), RR provides a granular measure that captures both the invariance of the predicted label and the sensitivity of predicted class probabilities to adversarial attacks. This distinction is especially relevant in safety-critical contexts, where the reliability of model confidence is as significant as the correctness of its predictions (Lyu et al., 2024).

1. Formal Definition and Notation

Let $X \subset \mathbb{R}^d$ denote the input space, and $\mathcal{Y} = \{1, \dots, K\}$ the set of class labels. Consider a classifier $f_{\mathbf{w}} : X \to [0,1]^K$ with parameters $\mathbf{w}$ , producing class-conditional probabilities: $f_{\mathbf{w}}(x) = (p_1(x),\ldots,p_K(x)), \qquad \sum_{k=1}^K p_k(x) = 1$ Let $\hat y(x) = \arg\max_k p_k(x)$ be the predicted label, and $\{(x_i, y_i)\}_{i=1}^N$ a test set. Suppose a norm $\|\cdot\|$ is fixed, with perturbation budget $\epsilon$ ( $\|\delta\| \leq \epsilon$ ), and $b \geq 0$ is a threshold for allowable output probability change.

The local robustness predicate for sample $(x_i, y_i)$ is: $\phi_{\mathrm{rob}^i}(\epsilon, b) = \begin{cases} 1, & \forall\,\delta \in \mathbb{R}^d: \|\delta\|\leq\epsilon, \ |p_{\hat y(x_i)}(x_i+\delta) - p_{\hat y(x_i)}(x_i)| \leq b \ 0, & \text{otherwise} \end{cases}$ The robust ratio is defined as: $\mathrm{RR}(\epsilon, b) = \frac{1}{N} \sum_{i=1}^N \phi_{\mathrm{rob}^i}(\epsilon, b)$

2. Distinction from Robust Accuracy (RA)

Robust Accuracy (RA) measures only the invariance of the predicted label under perturbation: $\mathrm{RA}(\epsilon) = \frac{1}{N} \sum_{i=1}^N \mathbb{1}\big(\hat y(x_i+\delta_i) = y_i\big)$ where $\delta_i$ is typically the worst-case adversarial perturbation with $\|\delta_i\| \leq \epsilon$ . RA is agnostic to confidence changes as long as the predicted label is unchanged.

Robustness Rate imposes a stricter requirement: it counts an example as robust only if, for the predicted label $\hat y(x_i)$ , the probability assigned is stable—deviating by no more than $b$ under any adversarial perturbation of size $\leq \epsilon$ . Thus, RR constrains not only the top-1 label consistency but also the maximum fluctuation in model confidence.

For a fixed $\epsilon$ , it is possible for $\mathrm{RA}(\epsilon) \approx 1$ (label invariance) while $\mathrm{RR}(\epsilon, b) \ll 1$ (confidence instability under perturbations). Conversely, when $\mathrm{RR}(\epsilon, b) \approx \mathrm{RA}(\epsilon)$ , the model exhibits both label and confidence robustness.

3. Estimation Methods and Pseudocode

Exact evaluation of RR is computationally intractable due to the universal quantification over all $\delta$ such that $\|\delta\|\leq \epsilon$ . In practice, RR is estimated by random sampling within the allowed perturbation norm or by applying adversarial optimization (e.g., PGD, CW attacks) to approximate the worst-case probability change. The following outlines a sampling-based procedure to estimate $\mathrm{RR}(\epsilon, b)$ :

count = 0
for i in 1..N:
    probs0 = f_w.predict_proba(x_i)
    y_hat = argmax(probs0)
    p0 = probs0[y_hat]
    stable = True
    for j in 1..M:
        δ = sample_random_vector_with_norm_leq(ε)
        probs_pert = f_w.predict_proba(x_i + δ)
        p_pert = probs_pert[y_hat]
        if abs(p_pert - p0) > b:
            stable = False
            break
    if stable:
        count += 1
RR_estimate = count / N
return RR_estimate

Sweeping over multiple $(\epsilon, b)$ values enables comprehensive robustness profiling (Lyu et al., 2024).

4. Experimental Observations and Comparative Analysis

Empirical studies using deepfake detection benchmarks demonstrate divergent RR behavior across models, even when RA remains similar. For example, three architectures (Meso4, Meso4Inception, ResNet34) exposed to FGSM, PGD, and CW attacks show that RA curves can align closely while RR curves diverge substantially: some models maintain higher stability in predicted class probability than others for the same attack strength.

Key findings include:

For increasing $\epsilon$ (from 0 up to 0.2), RR decays from 1.0 to near 0, with the rate and curve shape varying by architecture and data modality.
On video data, the RR curve declines nearly linearly with $\epsilon$ , indicating gradual confidence loss.
On still-image data, the RR curve is typically nonlinear, with significant drops past a threshold in $\epsilon$ .

This suggests RR can expose instability that RA alone cannot, illuminating vulnerability not evident from label-only metrics (Lyu et al., 2024).

5. Interpretation, Granularity, and Application of Robustness Rate

RR’s strengths derive from its granularity and interpretability:

It quantifies stability in the model’s confidence, capturing even small—but systematic—shifts under adversarial input.
A high RR asserts no perturbation within the budget causes more than $b$ deviation in output probability for the predicted class, offering interpretable guarantees.
In domains where changes in confidence scores are as critical as label invariance (e.g., healthcare, finance), RR provides a more application-aligned robustness assessment.

However, RR’s value depends on a well-chosen $b$ . Stricter $b$ (smaller thresholds) correspond to tighter confidence demands and yield lower RR, necessitating task-specific calibration.

6. Limitations and Practical Considerations

Several factors constrain the use and interpretation of RR:

The necessity to select $b$ in alignment with domain-specific confidence tolerances.
The computational intractability of exploring the full perturbation ball, leading to reliance on either random sampling or strong adversarial solvers (such as PGD/CW), which only approximate the supremum.
The empirical independence of RR and RA: high RA does not guarantee high RR. Models may maintain the correct label under perturbation but demonstrate large probability swings, undermining trust in model output.

For practical deployment:

Joint evaluation over grids of $(\epsilon, b)$ yields a more nuanced robustness profile.
Reporting both RA and RR is essential for transparency, especially when deploying classifiers in high-stakes environments.
Utilizing adversarial solvers to approximate worst-case $\delta$ gives a tighter lower bound for RR violations than random sampling.

7. Implications for Robust Deep Learning Research

Robustness Rate enhances the toolkit for evaluation of trustworthy deep learning systems, particularly under adversarial threat models. It enables differentiation between models that may appear comparable under label-based metrics but display markedly different behaviors in terms of output probability robustness. By integrating RR into evaluation protocols, researchers can better align robustness analysis with deployment requirements and failure-mode characterization, ultimately supporting safer and more reliable automated decision-making (Lyu et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

Is it the model or the metric -- On robustness measures of deeplearning models (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Robustness Rate (RR).