Papers
Topics
Authors
Recent
Search
2000 character limit reached

Adaptive and Aggressive Rejection (AAR)

Updated 3 December 2025
  • AAR is a dynamic rejection framework that filters anomalous or adversarial contributions using robust statistical thresholds and a Gaussian mixture model for soft rejection.
  • It integrates a warm-up phase with hard rejection and a main phase with ternary weighting to optimize data retention and improve performance metrics like AUROC.
  • In adaptive control, AAR employs disturbance observers and finite-time controllers to aggressively cancel perturbations, ensuring rapid convergence under uncertainty.

Adaptive and Aggressive Rejection (AAR) encompasses a family of algorithmic mechanisms for robustly filtering out undesirable or adversarial contributions during inference or learning. In anomaly detection, AAR refers to a dynamic, data-driven rejection framework that adaptively identifies and excludes contaminated samples by jointly leveraging robust statistical thresholds and probabilistic modeling. In nonlinear control, AAR describes the coordinated use of adaptive, experience-accelerated disturbance estimators and finite-time controllers to aggressively cancel exogenous perturbations. Across both contexts, AAR is characterized by its principled, multi-phase rejection logic and its capacity to dynamically optimize the trade-off between retention and exclusion under uncertainty.

1. Mathematical Foundations of AAR for Anomaly Detection

AAR for anomaly detection operates on a contaminated dataset D={xi}i=1N\mathcal{D} = \{x_i\}_{i=1}^N, using anomaly scores si=s(xi)s_i = s(x_i). For reconstruction-based models, si=xif(xi)22s_i = \|x_i - f(x_i)\|_2^2, where ff is typically an autoencoder. The framework dynamically rejects anomalies in each mini-batch via a tiered thresholding procedure:

  • Modified z–score (hard rejection): For batch size BB, compute

s^=median{si},MAD=mediansis^,mi=0.6745(sis^)MAD.\hat s = \mathrm{median}\{s_i\}, \quad \mathrm{MAD} = \mathrm{median}|s_i - \hat s|, \quad m_i = \frac{0.6745 (s_i - \hat s)}{\mathrm{MAD}}.

Samples with mi>3.5m_i > 3.5 (i.e., si>τNs_i > \tau_N) are hard rejected with threshold

τN=s^+3.50.6745MAD.\tau_N = \hat s + \frac{3.5}{0.6745} \mathrm{MAD}.

p(s)=π1N(sμ1,σ12)+π2N(sμ2,σ22),μ1<μ2.p(s) = \pi_1 \mathcal{N}(s\,|\,\mu_1,\sigma_1^2) + \pi_2 \mathcal{N}(s\,|\,\mu_2,\sigma_2^2), \quad \mu_1 < \mu_2.

The intersection threshold si=s(xi)s_i = s(x_i)0 solves

si=s(xi)s_i = s(x_i)1

Explicitly,

si=s(xi)s_i = s(x_i)2

with

si=s(xi)s_i = s(x_i)3

  • si=s(xi)s_i = s(x_i)4–si=s(xi)s_i = s(x_i)5 (stability guard): Compute

si=s(xi)s_i = s(x_i)6

with si=s(xi)s_i = s(x_i)7 the mean/std of the “normal” GMM component, si=s(xi)s_i = s(x_i)8.

The final soft rejection threshold is si=s(xi)s_i = s(x_i)9.

2. Integrated Hard and Soft Rejection Strategies

AAR integrates these thresholds into a phased rejection weighting scheme:

  • Warm-up (first si=xif(xi)22s_i = \|x_i - f(x_i)\|_2^20 epochs): Only the hard cutoff si=xif(xi)22s_i = \|x_i - f(x_i)\|_2^21 is active; si=xif(xi)22s_i = \|x_i - f(x_i)\|_2^22 for si=xif(xi)22s_i = \|x_i - f(x_i)\|_2^23, si=xif(xi)22s_i = \|x_i - f(x_i)\|_2^24 otherwise.
  • Main phase (si=xif(xi)22s_i = \|x_i - f(x_i)\|_2^25): Use weights

si=xif(xi)22s_i = \|x_i - f(x_i)\|_2^26

with si=xif(xi)22s_i = \|x_i - f(x_i)\|_2^27 (typically si=xif(xi)22s_i = \|x_i - f(x_i)\|_2^28).

This approach transforms sample selection from binary (keep/discard) into a ternary regime si=xif(xi)22s_i = \|x_i - f(x_i)\|_2^29, allowing ambiguous samples to influence training with attenuated impact. This aggressive rejection—removing ff0 more than the nominal contamination—empirically yields heightened robustness and improved AUROC, particularly when normal and anomaly score distributions overlap (Lee et al., 26 Nov 2025).

3. AAR Algorithm and Computational Complexity

The AAR training protocol proceeds as follows for each mini-batch and epoch up to ff1:

  1. Compute anomaly scores ff2.
  2. Determine ff3 for all epochs.
  3. If ff4:
    • Fit GMM and derive ff5, ff6, ff7.
  4. Assign weights ff8 according to the current phase.
  5. Compute the weighted loss,

ff9

and update the model.

Computationally, for mini-batch size BB0 and feature dimensionality BB1, the per-step cost is dominated by the forward/backward pass BB2; thresholding and EM for GMM fitting add BB3 overhead, rendering AAR scalable for large BB4 and BB5.

4. AAR in Adaptive Control: Disturbance Rejection

In adaptive nonlinear control, AAR is exemplified by architectures that combine online disturbance identification with aggressive, finite-time error suppression (Li et al., 2020). Consider a nonlinear plant,

BB6

subject to exosystem-generated disturbance BB7. The core components are:

  • Adaptive disturbance observer: State-derivative-free estimation using a filtered regressor, adaptive update of BB8 via Lyapunov-stable adaptation,

BB9

where experience replay (s^=median{si},MAD=mediansis^,mi=0.6745(sis^)MAD.\hat s = \mathrm{median}\{s_i\}, \quad \mathrm{MAD} = \mathrm{median}|s_i - \hat s|, \quad m_i = \frac{0.6745 (s_i - \hat s)}{\mathrm{MAD}}.0) accelerates convergence.

  • Aggressive (finite-time) controller: Integral-type terminal sliding mode with adaptive gain,

s^=median{si},MAD=mediansis^,mi=0.6745(sis^)MAD.\hat s = \mathrm{median}\{s_i\}, \quad \mathrm{MAD} = \mathrm{median}|s_i - \hat s|, \quad m_i = \frac{0.6745 (s_i - \hat s)}{\mathrm{MAD}}.1

enforcing s^=median{si},MAD=mediansis^,mi=0.6745(sis^)MAD.\hat s = \mathrm{median}\{s_i\}, \quad \mathrm{MAD} = \mathrm{median}|s_i - \hat s|, \quad m_i = \frac{0.6745 (s_i - \hat s)}{\mathrm{MAD}}.2 in finite time provided certain rank/richness conditions are met.

The “adaptive” aspect derives from online parameter learning, while “aggressive rejection” is realized through high-bandwidth feed-forward cancellation and non-asymptotic convergence guarantees.

5. Empirical Evaluation and Performance

In anomaly detection benchmarks (Lee et al., 26 Nov 2025), AAR demonstrates:

  • On MNIST/Fashion-MNIST with up to s^=median{si},MAD=mediansis^,mi=0.6745(sis^)MAD.\hat s = \mathrm{median}\{s_i\}, \quad \mathrm{MAD} = \mathrm{median}|s_i - \hat s|, \quad m_i = \frac{0.6745 (s_i - \hat s)}{\mathrm{MAD}}.3 synthetic contamination, AAR achieves average AUROC increases of s^=median{si},MAD=mediansis^,mi=0.6745(sis^)MAD.\hat s = \mathrm{median}\{s_i\}, \quad \mathrm{MAD} = \mathrm{median}|s_i - \hat s|, \quad m_i = \frac{0.6745 (s_i - \hat s)}{\mathrm{MAD}}.4 (MNIST) and s^=median{si},MAD=mediansis^,mi=0.6745(sis^)MAD.\hat s = \mathrm{median}\{s_i\}, \quad \mathrm{MAD} = \mathrm{median}|s_i - \hat s|, \quad m_i = \frac{0.6745 (s_i - \hat s)}{\mathrm{MAD}}.5 (F-MNIST) over the prior best latent outlier exposure (LOE).
  • On s^=median{si},MAD=mediansis^,mi=0.6745(sis^)MAD.\hat s = \mathrm{median}\{s_i\}, \quad \mathrm{MAD} = \mathrm{median}|s_i - \hat s|, \quad m_i = \frac{0.6745 (s_i - \hat s)}{\mathrm{MAD}}.6 UCI-type tabular datasets contaminated at s^=median{si},MAD=mediansis^,mi=0.6745(sis^)MAD.\hat s = \mathrm{median}\{s_i\}, \quad \mathrm{MAD} = \mathrm{median}|s_i - \hat s|, \quad m_i = \frac{0.6745 (s_i - \hat s)}{\mathrm{MAD}}.7, AAR lifts average AUROC by s^=median{si},MAD=mediansis^,mi=0.6745(sis^)MAD.\hat s = \mathrm{median}\{s_i\}, \quad \mathrm{MAD} = \mathrm{median}|s_i - \hat s|, \quad m_i = \frac{0.6745 (s_i - \hat s)}{\mathrm{MAD}}.8 across AE/MemAE/DSVDD backbones relative to robust statistics (MZ).
  • Overall, AAR’s average AUROC gain over all prior methods is s^=median{si},MAD=mediansis^,mi=0.6745(sis^)MAD.\hat s = \mathrm{median}\{s_i\}, \quad \mathrm{MAD} = \mathrm{median}|s_i - \hat s|, \quad m_i = \frac{0.6745 (s_i - \hat s)}{\mathrm{MAD}}.9.

Ablations confirm that slightly over-estimating contamination (by mi>3.5m_i > 3.50) enhances robustness; increasing mi>3.5m_i > 3.51 in the mi>3.5m_i > 3.52–mi>3.5m_i > 3.53 cutoff improves stability with negligible loss; soft rejection with mi>3.5m_i > 3.54 optimizes the bias-variance trade-off.

In adaptive control (Li et al., 2020), experience replay reduces disturbance estimation time from mi>3.5m_i > 3.55 s to mi>3.5m_i > 3.56 s and ensures finite-time tracking in mi>3.5m_i > 3.57 s (in nonlinear benchmarks), contrasting with much slower convergence in experience-free observers.

6. Practical Tuning, Limitations, and Outlook

Tuning recommendations for anomaly detection include mi>3.5m_i > 3.58 (warm-up epochs), mi>3.5m_i > 3.59 (stability guard), and si>τNs_i > \tau_N0 (soft rejection weight). For adaptive control, filter and adaptation gains are selected to balance estimation speed and sensitivity to noise, while the replay window is optimized against memory and numerical stability.

Notable limitations:

  • The univariate GMM used in AAR assumes a bimodal, near-Gaussian score distribution, which can be violated in highly skewed or multi-modal cases.
  • Hyperparameters si>τNs_i > \tau_N1 still require domain-specific tuning.

Open research directions involve meta-learning for automatic parameter adaptation, integrating limited anomaly labels (semi-supervised AAR), extending to high-dimensional or non-Gaussian score spaces, and adapting AAR for real-time data streams in cyber-physical systems and IoT scenarios (Lee et al., 26 Nov 2025, Li et al., 2020).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adaptive and Aggressive Rejection (AAR).