Dynamic Suppression of Subject Representations

Updated 22 January 2026

The paper demonstrates how dynamic suppression leverages architectural division and conditional projection to significantly reduce subject-specific encoding.
It details methods like dual-branch masking, low-rank adapters, and contrastive losses to ensure disentanglement and boost cross-subject performance.
Empirical findings show measurable gains in tasks such as gaze estimation and EEG decoding, validating the approach’s generalizability.

Dynamic suppression of subject-specific representations refers to architectural and algorithmic mechanisms that force shared model components to attenuate, remove, or otherwise avoid encoding features idiosyncratic to individual subjects (biological or otherwise), steering the network toward invariant, generalizable, or transferable representations. This strategy is widely adopted across domains such as gaze estimation, physiological signal decoding, multimodal brain response modeling, and knowledge unlearning in large-scale LLMs. The following sections survey the principal mechanisms, their mathematical formulations, application scenarios, and empirical findings, anchored to recent research.

1. Architectural Patterns for Dynamic Suppression

Broadly, dynamic suppression is implemented by structurally dividing the network into (i) shared, subject-invariant modules and (ii) subject-specific conditional pathways, with auxiliary losses and regularization to ensure disentanglement and selective gating.

Subject-Conditional Projection and Modulation

In unsupervised gaze estimation, "ConGaze" introduces a subject-conditional projection module $S$ , appended to a shared feature extractor $F$ . The projection module takes as input both the general feature $h\in\mathbb{R}^d$ and a one-hot subject ID, producing a subject-specific embedding $z = S(h, ID)$ via a two-layer MLP. Because $S$ is responsible for all subject-specific cues, the upstream extractor $F$ is incentivized (via the contrastive loss applied only in subject-conditional space) to encode strictly subject-invariant (i.e., gaze-relevant) features. This can be interpreted as a simple instance of re-injecting identity information downstream only at the projection phase, analogously to FiLM or gating layers (Du et al., 2023).

Dual-Branch Masking

"PTSM" for cross-subject EEG decoding designs separate dual attention masks for each trial: a personalized mask $M^p$ , and a shared mask $M^s$ . Each is factorized over space (channels) and time, then fused. The shared mask suppresses subject-unique and noise components, while the personalized mask can highlight idiosyncratic signals. The masking is dynamic, as each sample’s masks are adaptively generated by small neural subnetworks, and suppression is enforced by orthogonal disentanglement constraints on the resulting embeddings (Jing et al., 15 Aug 2025).

Low-Rank Subject-Specific Adapters

In large-scale EEG decoding, the Subject-Conditioned Layer (SCL) replaces standard linear kernels with a decomposed weight $W = W_\mathrm{general} + \Delta W_s$ , where $\Delta W_s$ is a low-rank, subject-specific correction. During inference, $\Delta W_s$ is enabled only for the test subject; for unseen subjects, it is set to zero, dynamically suppressing all learned subject-specific adaptations and reverting to a shared, invariant model (Klein et al., 9 Oct 2025).

Feature-Wise Affine Modulation

In cross-subject mind decoding from fMRI, the Subject Bias Modulation Module (SBMM) is a per-subject block interleaved into encoder and decoder networks, normalizing features and applying learned featurewise scale and shift (as in conditional batchnorm): $Y = \gamma_x \frac{X_x - \mu(X_x)}{\sigma(X_x)} + \beta_x$ . This transforms or “washes out” subject biases dynamically during encoding and decoding (Xu et al., 25 Jul 2025).

MoE Sparse Routing and Load-Balancing

In multimodal brain encoding, "MIND" applies a dynamic mixture-of-experts decoder where a subject-aware gating mechanism produces sparse expert activations for each token. A subject-specific bias matrix and load-balancing regularization prevent collapse to purely idiosyncratic expert usage, enforcing that only a subspace of experts ever dominates per subject, and that overall predictions retain significant genericity (Yin et al., 6 Oct 2025).

2. Mathematical Objectives and Disentanglement Losses

Dynamic suppression fundamentally relies on explicit constraints in the training objective that decompose or orthogonalize subject-specific and subject-invariant factors.

Contrastive and InfoNCE Losses

For contrastive representation learning, the InfoNCE loss is applied in per-subject embedding spaces, pulling together gaze-consistent pairs and pushing apart gaze-contrastive pairs, with projection heads that receive subject-conditioning (Du et al., 2023). This identity-specific contrastive learning ensures that the general encoder need not retain identity to satisfy the objective.

Orthogonality and Covariance Constraints

Strong disentanglement in PTSM and DS-DDPM is achieved by regularizing that task-specific ( $f^t$ ) and subject-specific ( $f^s$ ) embedding subspaces are orthogonal, both at the single-example level (vector dot product) and batch-level (covariance): $L_\mathrm{orth} = \mathbb{E}_i | (f^t_i)^\top f^s_i | / (\|f^t_i\| \|f^s_i\| )$ , $L_\mathrm{cov} = \| \mathrm{Cov}(F^t, F^s) \|_F$ (Jing et al., 15 Aug 2025, Duan et al., 2023). Similar constraints appear as cross-covariance and mutual information penalties.

Adversarial and GRL-Based Suppression

Task-agnostic suppression can be implemented by attaching a gradient reversal layer to the feature encoder, with a suppression head tasked to predict random labels in a $n$ -class space. The GRL reverses gradients, so features become indiscriminately uninformative for any $n$ -way partition, i.e., all possible adversarial classifiers. The resulting network becomes dynamically blind to all unknown partitions, not just those seen during training (Panwar et al., 2020).

3. Empirical Evidence, Ablations, and Benchmarks

Empirical verification of dynamic suppression employs a combination of ablation studies, clustering/visualization, and generalization metrics.

Cross-Subject Transfer Gains

ConGaze reduces average gaze estimation angular error by 1.3°–1.7° (15–25%) compared to a supervised baseline in cross-dataset evaluation. Removal of the conditional projection module increases error by as much as 7.0° (57%), demonstrating the suppression’s centrality (Du et al., 2023).

PTSM achieves consistent 2.6–4.4% higher pre-adaptation accuracy on open cross-subject datasets, with ablation of the masking branches or orthogonality losses leading to 2–4.1% drops in accuracy (Jing et al., 15 Aug 2025).

SCL-equipped EEG decoders outperform both pooled-parameter and subject-specific LoRA-only models by 5–10% accuracy points on BCI Competition IV tasks. Qualitative t-SNE embeddings confirm that subject-agnostic and subject-residual components are well separated (Klein et al., 9 Oct 2025).

Visualization and Interpretability

t-SNE projections in SimCLR vs. ConGaze exhibit strong subspace clustering by subject ID without suppression, but highly intermixed identity-agnostic feature clusters with suppression. Grad-CAM maps demonstrate attention shifting from broad facial features to the periocular region under dynamic suppression (Du et al., 2023).

In DS-DDPM, denoised domain-variance samples from the domain stream are distinctly aligned with subject identity, while content stream samples are mixed, as per t-SNE and correlation matrices (Duan et al., 2023).

Trust and Overlearning Metrics

The Trust Score, constructed as a weighted matrix norm distance from an ideal identity matrix, quantifies the ability to suppress leakage into unintended tasks. Suppressing unknown tasks via GRL and random labels increases Trust Score from ~0.75–0.81 (vanilla) to ~0.88–0.90, as shown for Inception-v1, MobileNet, and VGG variants on synthetic and face datasets (Panwar et al., 2020).

4. Extensions, Online Adaptation, and Scalability

Dynamic suppression schemes are adaptable to new tasks, domains, or unforeseen identities.

Online and Fast Adaptation

SBMM modules in BAI and SCL adapters are lightweight, subject-specific heads. For a new subject, only the adapter is learned (typically from a handful of samples), leaving shared weights fixed. Empirically, 500–1,500 samples suffice to adapt cross-subject mind decoding performance, with gains of 20–30% in PixCorr (Xu et al., 25 Jul 2025).

For unseen subjects in SCL or MoE frameworks, one simply disables the subject-specific heads/adapters, reverting to generic representations (Klein et al., 9 Oct 2025, Yin et al., 6 Oct 2025).

Application to Knowledge Unlearning

In LLMs, selective erasure of subject-specific ("forbidden") knowledge is operationalized through targeted activation signature extraction and dynamic suppression via capsules, as in the Knowledge Immunization Framework (KIF). Low-rank LoRA adapters then permanently enforce the suppression, allowing efficient deployment and permanent knowledge erasure (Mahmood et al., 15 Jan 2026). Feature-selective misdirection (SRMU) further allows dynamic, axis-aligned perturbations limited to critical subspaces, efficiently unlearning even when knowledge and benign examples are entangled (Chen et al., 18 Dec 2025).

Domain-Generalization and Multimodality

Domain-conditional projection, dual-branch masking, and sparse MoE routing generalize to other sources of domain or session variability—electrode montage, session ID, speaker identity, or even multi-modal fusion contexts. The principle remains to preserve general features and inject (or suppress) domain/subject information at precisely controlled stages (Du et al., 2023, Yin et al., 6 Oct 2025, Duan et al., 2023).

5. Limitations and Future Prospects

Despite clear improvements in cross-domain robustness, dynamic suppression approaches have limitations.

Calibration and Computation

Some architectures, such as DS-DDPM, require subject labels and calibration with per-subject training data, which may not always be available (Duan et al., 2023). Iterative generative processes (e.g., diffusion) are computationally costly compared to classical filtering, although recent advances in consistent and fast diffusion samplers may help.

Entanglement and Residual Leakage

Even with strong disentanglement penalties, suppression is never absolute when features are highly entangled or when latent leakage pathways exist. Careful loss balancing is required to prevent collapse of utility (catastrophic forgetting). Some methods, such as SRMU, specifically address this through product-based importance maps distinguishing benign and entangled dimensions (Chen et al., 18 Dec 2025).

Interpretability and Mechanistic Guarantees

While interpretability is improved via explicit masks, adapters, or subspaces, mechanistic guarantees of true erasure (as opposed to superficial output suppression) remain challenging, as illustrated in the distinction between obfuscation and activation signature erasure in KIF (Mahmood et al., 15 Jan 2026).

6. Representative Methodologies Across Domains

The following table summarizes representative methods, domains, and dynamic suppression mechanisms:

Paper (arXiv ID)	Domain/Task	Dynamic Suppression Principle
(Du et al., 2023) (ConGaze)	Gaze estimation	Subject-conditional projection
(Jing et al., 15 Aug 2025) (PTSM)	Cross-subject EEG	Dual-branch spatio-temporal masks
(Duan et al., 2023) (DS-DDPM)	Brain dynamics, EEG	Orthogonal generative denoising
(Yin et al., 6 Oct 2025) (MIND)	Multimodal fMRI	MoE sparse routing/load-balancing
(Klein et al., 9 Oct 2025) (SCL)	EEG decoding	Per-subject low-rank adapters
(Panwar et al., 2020)	Vision (privacy)	GRL+random task suppression
(Xu et al., 25 Jul 2025) (BAI)	fMRI decoding	Conditional feature-wise modulation
(Mahmood et al., 15 Jan 2026) (KIF)	LLM unlearning	Activation-signature capsule gating
(Chen et al., 18 Dec 2025) (SRMU)	LLM unlearning	Feature-selective perturbation
(Liu et al., 5 Jan 2025)	EEG→language mapping	LLM denoiser over multi-subject code

These approaches exemplify both the architectural flexibility and the domain transferability of dynamic subject-specific representation suppression.

In summary, dynamic suppression of subject-specific representations is realized by a combination of conditional architectural modules, information-theoretic and orthogonality constraints, selective regularization, and sample-adaptive mechanisms designed to prevent leakage of idiosyncratic subject (or task/domain) information into the core, shared feature backbone. Empirical results across gaze estimation, brain decoding, and LLM knowledge unlearning consistently demonstrate the utility and generalizability benefits of these approaches, while ablation and visualization studies confirm their mechanistic suppression of unwanted subject-specific structure (Du et al., 2023, Jing et al., 15 Aug 2025, Duan et al., 2023, Yin et al., 6 Oct 2025, Klein et al., 9 Oct 2025, Panwar et al., 2020, Xu et al., 25 Jul 2025, Mahmood et al., 15 Jan 2026, Chen et al., 18 Dec 2025, Liu et al., 5 Jan 2025).