Counterfactual Samples Synthesizing (CSS)

Updated 6 February 2026

Counterfactual Samples Synthesizing (CSS) is a class of methods that creates minimally perturbed data points to flip, clarify, or stress-test machine learning model predictions.
CSS systematically modifies critical input features—via masking, replacement, or gradient-based selection—to target causal components across vision, language, tabular, and graph data.
By focusing on model sensitivities, CSS improves robustness, interpretability, fairness, and generalization while reducing reliance on spurious correlations.

Counterfactual Samples Synthesizing (CSS) is a class of methods designed to generate artificial data points—counterfactual samples—that minimally perturb input features or components so as to flip, clarify, or stress-test the predictions of machine learning models. Unlike standard data augmentation, which increases diversity by random or label-preserving transformations, CSS targets the model's causal or decision logic, producing precisely controlled perturbations that reveal or enforce sensitivity to critical features. CSS has been applied across modalities—vision, language, tabular data, and recommendation systems—and is increasingly used both for diagnostic analysis (e.g., explainability, robustness, fairness) and to improve generalization in training regimes where spurious correlations or shallow priors dominate.

1. Foundational Concepts of Counterfactual Samples Synthesizing

CSS operates by generating paired or grouped data samples that differ minimally in specific, model-relevant aspects, such that these aspects have maximal impact on the model’s output. Given an input–output pair (x, y), CSS constructs a related input x*—the counterfactual—such that (x*, y*) either invalidates, reverses, or highlights the model’s prediction, while preserving as much irrelevant structure of x as possible. The counterfactual y* can be defined as the negation, complement, or plausible alternative of y, depending on task constraints.

The typical CSS workflow involves:

Identification of "critical" or "causal" input components (e.g., objects in images, tokens in text, graph substructures).
Precise modification (masking, perturbing, replacing, etc.) of these components to create x*.
Assignment of a new label or output y*, often via pseudo-labeling, dynamic model-based reweighting, or semantic knowledge.
Use of (x*, y*) in tandem with (x, y) for targeted model training, regularization, or post-hoc analysis.

CSS is fundamentally motivated by the need to break spurious shortcuts and enforce "right for the right reasons" model behavior (Chen et al., 2021, Liu et al., 2024).

2. CSS in Vision and Vision-LLMs

In visual reasoning tasks such as Visual Question Answering (VQA), CSS is operationalized as follows (Chen et al., 2021, Chen et al., 2020) :

Critical Object Masking (V-CSS): For a given image-question-answer triple (I, Q, a), a set of candidate objects 𝒪 is obtained by matching image region labels with question nouns. Each object's local contribution to the model's answer is quantified by the gradient of P_vqa(a|I,Q) with respect to the object feature. The minimal subset whose contributions sum to at least a threshold η (e.g., η = 0.65) is deemed “critical.” These regions are masked in I to form a counterfactual image I⁻.
Critical Word Masking (Q-CSS): Analogously, key question words (excluding type preambles) are identified via the same gradient attribution and masked in Q to form Q⁻.
Pseudo-Label Assignment: The counterfactual answer a⁻ is dynamically assigned: the model’s prediction on a "complementary" input (I⁺, Q) or (I, Q⁺) is used to invert the original label, e.g., a⁻ = 1 − P_vqa(a|I⁺,Q). This yields soft, multi-label supervision for (I⁻,Q,a⁻) or (I,Q⁻,a⁻).

These counterfactual samples are incorporated into training via joint cross-entropy losses and supervised contrastive objectives, which push the model to both succeed on the original example and fail (or change) on the counterfactual, sharpening both visual explainability and question sensitivity. The augmentation induces substantial improvements in out-of-distribution robustness, explainability (e.g., Grad-CAM alignment), and sensitivity to paraphrases or word dropouts (Chen et al., 2021, Chen et al., 2020).

3. CSS in Language and Commonsense Reasoning

For LLMs and plausibility estimation, CSS identifies critical words by gradient-based saliency and synthesizes counterfactuals via targeted replacement or dropout (Liu et al., 2024):

Critical Token Identification: Each content word’s contribution to the model’s plausibility score is given by s(α, e_i) = (∇_{e_i} P_PE(α))^T 1. Top-ranked tokens are selected for intervention.
Negative Sampling: Each critical word is replaced with m nearest neighbors in semantic space (e.g. via GloVe VKB), yielding minimally altered but semantically reversed counterfactual sentences.
Positive Sampling: Token dropout is randomly applied to the sentence to enforce invariance to non-critical segments.
Training Losses: Binary cross-entropy over original/counterfactual pairs is augmented with a sentence-level supervised contrastive loss, grouping all same-label variants as positives and label-flipped as negatives. The full loss, L = α L_bin + β L_cot, ensures that the embedding space reflects both plausibility and language-explainability.

On diverse commonsense and reasoning benchmarks, these methods yield large gains in accuracy and marked reductions in bias rates, as well as improved gradient-based attention to causal words (Liu et al., 2024).

4. CSS in Structured, Tabular, and Graph Data

CSS is also foundational in structured data scenarios (tabular, graph, recommendation):

Tabular/Structured Models: CSS may employ conditional GANs with explicit constraints for proximity and outcome-flipping (Yang et al., 2021), or construct counterfactuals for recourse and policy learning as cost-minimal action sequences via reinforcement learning and program synthesis (Toni et al., 2022). Umbrella sampling may be used to cover rare feature-value combinations, and causal-structural module architectures enforce plausible data and outcome relationships during synthesis (Yang et al., 2021).
Imbalanced Classification: Counterfactual-based minority oversampling explicitly perturbs majority samples nearest the decision boundary to achieve label inversion with minimal feature displacement, using truncated normal perturbations and normalized distances (Luo et al., 2020).
Graph Learning: CSS can synthesize hard negative graph samples by structurally or feature-wise minimally perturbing graphs to cross the model’s decision boundary, maximizing KL divergence from the original prediction while minimizing Frobenius or ℓ₁ norm distance (Yang et al., 2022). These are employed as hard negatives in contrastive objectives, yielding superior performance on multiclass graph representation learning tasks.

5. CSS in Generative and Causal Inference Frameworks

In causal modeling and generative counterfactual inference, CSS entails explicit interventions in learned or specified structural causal models:

Causally Guided High-Dimensional Generation: Observed data are encoded into causal factors and interventions (do-operators) are simulated in latent space. Diffusion models condition the reverse-generation process on intervened causal codes, using guidance gradients derived from differentiable neural causal encoders. This maintains both high sample realism and strict compliance with the specified intervention (Zhu et al., 2024).
Unsupervised Domain Adaptation: CSS can produce structural counterfactuals across source and target domains by partitioning exogenous variables into shared and domain-specific components, learning joint neural causal mechanisms, and transferring inferred (effect-intrinsic) latent causes between domains (Kher et al., 17 Feb 2025).
Conformal Counterfactual Inference: Synthetic counterfactual labels, generated from pretrained models, are incorporated into conformal calibration sets and debiased via prediction-powered inference (PPI), yielding more efficient (narrower) prediction intervals for individual potential outcomes (Farzaneh et al., 4 Sep 2025).
Time-Varying Treatments: Generative counterfactual models under sequential interventions employ conditional VAE or diffusion architectures trained with inverse-probability reweighting, matching the target counterfactual law for all treatment histories without direct density modeling of each configuration (Wu et al., 2023).

6. CSS for Model Interpretability, Algorithmic Fairness, and Recourse

In algorithmic recourse and explanations, CSS underpins both actionable interventions and explanation policies:

Model Explanations and Region-Targeted Counterfactuals: Using semantic segmentation and latent region styles, models can synthesize minimal, plausible edits to individual regions of an image that flip a classifier’s prediction while preserving global structure; users can target specific regions to interpret model focus (Jacob et al., 2021).
Algorithmic Recourse Programs: Interventional recourse is formulated as program synthesis, generating action sequences (with costs and preconditions) that shift an individual across a classifier’s boundary, distilled into interpretable automata with per-action Boolean rules for transparency (Toni et al., 2022).
Causal Fairness and Realizability: The fundamental realizability of CSS is analyzed under the Pearl causal hierarchy, distinguishing observational, interventional, and counterfactual sampling; formal algorithms enumerate whether, and how, CSS can be physically realized given a causal graph, under constraints such as the Fundamental Constraint of Experimentation (Raghavan et al., 14 Mar 2025).

7. Applications, Limitations, and Impact

CSS contributes to model robustness, domain adaptation, interpretability, and fairness. Key results include dramatic out-of-distribution generalization gains in VQA (Chen et al., 2021), substantial accuracy boosts in commonsense reasoning (Liu et al., 2024), measurable reductions in data bias, and improved model monotonicity in large-scale recommenders (Xu et al., 3 Sep 2025). Empirical studies consistently report that CSS, especially when combined with contrastive or supervised pairing losses, both improves performance and elucidates model reliance on critical features.

However, CSS has limitations:

In early training, pseudo-label construction may be noisy as it depends on the model's current state (Chen et al., 2021).
Domain-specific or task-specific tuning (e.g., choice of η, critical set thresholds, or perturbation magnitudes) is generally required for maximal benefit (Luo et al., 2020).
Some CSS variants require access to pretrained models for accurate label assignment or generator construction (Farzaneh et al., 4 Sep 2025).
In structured settings, CSS often assumes identifiability of causal components and availability of accurate causal graphs or mechanisms (Kher et al., 17 Feb 2025, Zhu et al., 2024).
Realizability is a fundamental constraint; some counterfactuals cannot be physically sampled, even with arbitrary experimental interventions (Raghavan et al., 14 Mar 2025).

The ongoing development of CSS methods at the intersection of causal modeling, generative modeling, and machine learning promises increasingly powerful tools for enforcing and interpreting model sensitivities, addressing spurious correlations, and operationalizing the full machinery of counterfactual inference in real-world systems.