Semi-factual Explanations

Updated 21 January 2026

Semi-factual explanations are post-hoc interpretability methods that describe how feature modifications retain a model’s original output.
They utilize diverse algorithms—from instance-based to generative models—to balance sparsity, plausibility, and diversity in decision preservation.
Empirical benchmarks and complexity analyses highlight trade-offs in method design, underscoring the need for heuristic approaches in practical applications.

Semi-factual explanations, also known as "even-if explanations," constitute a class of post-hoc interpretability methods in Explainable AI (XAI) that articulate how a model’s outcome would remain unchanged under certain feature modifications. Distinct from counterfactuals, which identify minimal changes necessary to reverse a model decision, semi-factuals specify maximal or otherwise interesting changes that leave the output invariant, thereby characterizing the robustness of predictions. These explanations have been developed for a wide spectrum of domains, encompassing tabular, image, and sequential (e.g., RL) settings, as well as formal argumentation systems. Methods range from resource-efficient instance-based algorithms to deep generative approaches, yet fundamental computational hardness is pervasive, driving the need for heuristic and approximate algorithms. Theoretical desiderata, diverse algorithmic strategies, benchmarking protocols, key complexity results, and empirical findings all inform current best practices and future research for semi-factual XAI.

1. Formal Definitions and Conceptual Foundations

A semi-factual explanation (SF) for an input instance $x$ with label $y$ under predictor $f$ is any $x'$ satisfying

$\text{(i) } x' \neq x, \text{ and}\quad \text{(ii) } f(x') = y.$

This is in contrast to a counterfactual explanation, which requires $f(x') \neq y$ and typically seeks minimal $\|x'-x\|$ (e.g., under $\ell_0$ or $\ell_2$ ). Semi-factuals ask: "What (possibly large or targeted) feature changes could be made, even if the decision would still be $y$ ?"

In argumentation frameworks, given a labelling $L$ for framework $\Lambda$ and goal argument $g$ , a semifactual is a labelling $L'$ maximizing the Hamming distance $\delta(L,L')$ subject to $L'(g) = L(g)$ , i.e., "maximally changing other labels while preserving $g$ 's status" (Alfano et al., 2024).

For reject options, a semifactual for a rejected $x$ is an $x'$ such that the system would still reject ( $r_h(x') < \tau$ ), often with $r_h(x') \geq r_h(x)$ , reinforcing that "no matter these changes, the prediction is unchanged" (Artelt et al., 2022).

2. Desiderata for Semi-Factual Explanations

Synthesizing cognitive-science and AI literatures, key desiderata for semi-factual explanations are (Aryal et al., 2023, Gajcin et al., 2024):

Outcome invariance & factual countering: $x' \neq x,\, f(x')=y$ ;
Sparsity: $\|x'-x\|_0$ small (ideally $1$ per explanation);
Plausibility / Actionability: $x'$ lies on/near the data manifold (measured via density estimation, e.g., $P_{\textrm{model}}(x')>\tau$ );
Convincingness / Surprise: $x'$ differs from $x$ in a maximally effective but believable way;
Causal-role Update: Exposure to $x'$ alters a user’s mental model of $f$ ;
Fairness & Robustness: Feature changes avoid spurious proxies and maintain ceteris-paribus (local stability).

In reinforcement learning (RL), five adapted desiderata guide search: validity, temporal distance, stochastic uncertainty, policy fidelity, and exceptionality—quantified as temporal path length, environment-induced variance, and alignment to RL policy/transition (Gajcin et al., 2024).

No currently published method satisfies all desiderata simultaneously. Instead, trade-offs define method behaviors in empirical evaluation (Aryal et al., 2023).

3. Algorithmic Approaches Across Domains

A spectrum of algorithms have been proposed for constructive semi-factual explanation generation. A selection, systematized by method family, is presented below.

Method Family	Representative Techniques	Key Principle(s)
Instance-based (tabular)	Local-Region, KLEOR, MDN	Selects boundary-near or distant same-class exemplars (Aryal et al., 2023)
Counterfactual-guided	PIECE, C2C-VAE, DiCE	Moves toward a counterfactual, pulls back just before boundary crossing (Aryal et al., 2024, Kenny et al., 2020)
Optimization-based	DSER, IRDs	Multi-term losses balance feasibility, sparsity, difference, diversity (Artelt et al., 2022, Dandl et al., 2023)
Generative Models (vision, time-series)	PIECE (GAN), C2C-VAE, StyleGAN2	Operate in latent-space, transform "exceptional" features (Kenny et al., 2020)
RL-specific	SGRL-Rewind, SGRL-Advance	Multi-objective sequence optimization preserving policy outcome (Gajcin et al., 2024)
Argumentation	ASP-based search over labellings	Maximizes label change subject to goal preservation (Alfano et al., 2024)

Many current methods are "counterfactual-free," i.e., they do not depend on the initial construction of a boundary-flipping counterfactual, but seek directly in the decision-invariant region (Aryal et al., 2024). Optimization-based approaches are model-agnostic and support sparsity/diversity constraints, while generative approaches better maintain data manifold plausibility in high-dimensional spaces (Kenny et al., 2020).

4. Complexity and Computational Hardness

For finite discrete feature spaces, key complexity results reveal that even basic semi-factual search can be intractable (Artelt et al., 14 Jan 2026, Alfano et al., 2024):

Determining if a minimum sufficient-reason (MSR) SFE of size $k$ $k$ exists is
- PTIME for monotonic classifiers, linear thresholds, decision trees, extended linear rules, and free BDDs,
- NP-hard or higher for $k$ -NN, decision lists, ensembles, random forests, and ReLU neural networks ( $\Sigma_2^p$ -complete).
Maximum-change (MCA) SFE is PTIME only for perceptrons and free BDDs; NP-complete for ReLU nets.
In argumentation, semifactual existence and verification range from NP-complete (coherent/stable semantics) to $\Sigma_2^p$ -/ $\Pi_2^p$ -complete (preferred/semi-stable) (Alfano et al., 2024).
No general FPT (parameterized by $k$ ) or approximation schemes are currently available; enumeration for diversity is likewise hard except in trivial cases (Artelt et al., 14 Jan 2026).

A plausible implication is that, for most real-world rich model classes, heuristic or approximate algorithms are unavoidable for practical generation and enumeration of semi-factual explanations.

5. Empirical Benchmarking and Comparative Findings

Benchmarking studies on tabular datasets (e.g., Adult Income, Default Credit, HELOC, etc.) evaluate methods primarily on (Aryal et al., 2023, Aryal et al., 2024, Artelt et al., 2022):

Distance ( $\|\cdot\|_2$ from query): high values indicate more convincing deviation;
Plausibility (nearest neighbor distance to data): lower is better;
Confusability (ratio to trust-score core): lower is safer;
Robustness (local Lipschitz/stability): higher is more stable;
Sparsity ( $\ell_0$ ): 1-diff explanations preferred.

Principal empirical findings:

MDN and Local-Region baseline methods are top-ranked for plausibility and confusability, while C2C-VAE and PIECE excel in robustness and achievable distance (Aryal et al., 2024).
DSER and PIECE achieve best sparsity (largest % of explanations changing only 1 feature).
No single method dominates across all metrics; method selection should be guided by context-specific desiderata.
In RL, SGRL-Advance and SGRL-Rewind dominate in validity, temporal parsimony, and diversity compared to supervised baselines (Gajcin et al., 2024).
In reject-option XAI, DSER reliably produces diverse, sparse, and high-fidelity semifactual sets (Artelt et al., 2022).
Image-domain methods (e.g., PIECE) preserve on-manifold plausibility while making maximally large input changes with class invariance, as validated by multiple plausibility proxies (Kenny et al., 2020).

6. Applications and Domain-specific Extensions

Formal Argumentation

In abstract argumentation frameworks, semifactual explanation computation is cast as finding maximal-distance labelings with outcome preservation constraints for a goal argument. Answer-set programming with weak constraints provides an operational encoding (Alfano et al., 2024).

Reinforcement Learning

Semi-factuals in RL operate over entire trajectories or policy decisions. Backward and forward search algorithms (SGRL-Rewind/Advance) generate temporally-aware “even-if” scenarios while optimizing over reachability, policy fidelity, diversity, and exceptionality (Gajcin et al., 2024).

Reject Option Systems

For rejection-aware classifiers, semifactuals articulate why an input remains rejected even under alternative (but plausible and non-trivial) configurations, aiding in user recourse and model trust (Artelt et al., 2022).

Interpretable Regional Descriptors

Hyperbox-based local descriptors collect all possible inputs within a region such that model prediction remains unchanged, resulting in succinct semi-factual regions. Optimization is posed as finding maximal axis-aligned boxes of high coverage and perfect local precision (Dandl et al., 2023).

7. Recommendations, Limitations, and Future Directions

Consensus recommendations include (Aryal et al., 2023, Aryal et al., 2024):

Use MDN and Local-Region as strong baselines; generative methods for high-dimensional or tight plausibility constraints.
Extend benchmarking to images and time-series using latent features.
Develop formal, possibly user-study-based metrics of convincingness and causal-role shifts post-semi-factual exposure.
Incorporate ethical safeguards, including proxy auditing and robustness checks, before deployment.
Advance the field by seeking parameterized or hybrid algorithms that combine plausibility, diversity, and robustness.

Limitations identified:

Most published empirical studies focus on tabular data; clear needs exist for real-world validation on images, text, or RL agents.
Human-subject studies on the psychological impact and practical utility of semi-factuals remain limited or methodologically flawed.
Complexity barriers preclude exact methods for many model families—tractable, approximate, or FPT methods are a central open problem (Artelt et al., 14 Jan 2026).

A public repository of code, data splits, and benchmarks is maintained to foster reproducibility and method development (Aryal et al., 2023). Future research is directed toward hybridization of approaches, principled diversity/plausibility trade-offs, controlled interventions for causal-reasoning assessment, and regulatory implications for XAI transparency.