Papers
Topics
Authors
Recent
Search
2000 character limit reached

Leave-One-Group-Out (LOGO) Retraining

Updated 6 February 2026
  • LOGO retraining is a method that omits predefined data groups from training to assess group influence and enable counterfactual model analysis.
  • It is applied in data attribution, privacy unlearning, and structured model cross-validation to enhance model evaluation and selection.
  • Despite its insights, the approach faces computational challenges that have led to efficient approximations like targeted unlearning and teacher-student ensembles.

Leave-One-Group-Out (LOGO) retraining is a class of algorithms that facilitate group-wise data ablation for model evaluation, attribution, or unlearning. In contrast to leave-one-out (LOO) approaches that omit individual datapoints, LOGO omits pre-defined groups (e.g., users, semantic classes, spatial clusters), thereby capturing dependencies or higher-level structure not accessible via instance-wise excision. LOGO retraining underpins counterfactual reasoning about model behavior, group-level data influence, and rigorous generalization assessment in structured models, LLMs, and deep generative systems.

1. Mathematical Formulation of LOGO Retraining

Suppose the training set D\mathcal{D} is partitioned into mm disjoint groups G={g1,g2,,gm}G = \{g_1, g_2, \ldots, g_m\}, where each group giDg_i \subset \mathcal{D} may correspond to a semantic class, user, region, or style. Define the group retain set Di=DgiD_{-i} = \mathcal{D} \setminus g_i. The LOGO retraining procedure forms a new model θilogo\theta^{\text{logo}}_{-i} by retraining (or re-estimating) the original model on DiD_{-i} using the original objective function (Murata et al., 30 Jan 2026, Adin et al., 2023).

For supervised or generative models with objective L(θ;D)\mathcal{L}(\theta; D), the LOGO model parameters are

θilogo=argminθL(θ;Di)\theta^{\text{logo}}_{-i} = \arg\min_\theta \mathcal{L}(\theta; D_{-i})

The LOGO counterfactual for group gig_i is then: How would model predictions or output change if gig_i were omitted at training?

In structured models—such as latent Gaussian models—the formulation generalizes. Let Ii{1,...,n}I_i \subset \{1, ..., n\} denote indices of the group to be omitted for point ii; LOGO replaces the standard leave-one-out (LOO, Ii={i}I_i=\{i\}) by a leave-group-out cross-validation (LGOCV) where IiI_i is determined automatically or by design (Adin et al., 2023).

2. LOGO in Data Attribution and Unlearning

LOGO retraining provides a natural counterfactual for attributing model behavior to training data at the group level. In generative diffusion models, the training dataset is divided into groups (e.g., classes or artistic styles), and for each group gig_i, a LOGO counterfactual model is trained on DiD_{-i} (Murata et al., 30 Jan 2026). The influence of gig_i on a generated sample xx (possibly conditional on context cc) is then quantified by the difference in evidence lower bound (ELBO):

Δi(x,c)=ELBO(xc;θ)ELBO(xc;θilogo)\Delta\ell_i(x, c) = \mathrm{ELBO}(x \mid c; \theta^*) - \mathrm{ELBO}(x \mid c; \theta^{\text{logo}}_{-i})

A large positive Δi\Delta\ell_i indicates that group gig_i positively contributed to the explanation of xx; small or negative values indicate weak or negative group influence.

In LLM unlearning, LOGO is operationalized via teacher-student frameworks: the set of users is partitioned into blocks (shards), each teacher is trained on one block, and when forgetting a sequence from (say) user kk, a leave-one-out ensemble is formed by omitting the teacher(s) exposed to kk's data (Liu et al., 2023). The student model is updated so that its predictions match the LOO-ensemble (benign teachers only), minimizing KL divergence loss over the forget set.

3. Computational Complexity and Practical Approximations

Naïve LOGO retraining is computationally intensive: for mm groups, it requires m+1m+1 full model trainings, incurring prohibitive cost for large mm or dataset sizes. For example, LOGO on CIFAR-10 (m=10m=10) demands \sim2,076 GPU-hours (with each run costing \sim206 GPU-hours) (Murata et al., 30 Jan 2026).

To address this, practical approximations have been developed:

  • Machine Unlearning (GUDA): Instead of retraining from scratch, approximate LOGO counterfactuals are generated via targeted “forgetting” of each group from a shared full-data model. For each group gig_i, a lightweight unlearning operator fine-tunes the original parameters θ\theta^* using a combination of a group-forgetting loss and a distillation (preservation) loss on the retain set DiD_{-i}. This reduces preprocessing time by one to two orders of magnitude (e.g., 100×100\times speedup on CIFAR-10) (Murata et al., 30 Jan 2026).
  • Teacher-Student Ensembles: Multiple teacher models are trained on disjoint user blocks, and for each forget request, predictions from “benign” teachers (not exposed to the target user) are averaged to guide student fine-tuning. This method avoids the need to retrain for every possible group removal, reducing both memory and computational requirements (Liu et al., 2023).
  • Structured Models (INLA LGOCV): In latent Gaussian models, LOGO retraining is rendered feasible using INLA’s “group-CV” functionality, which reuses shared factorization and nested Laplace approximations for efficient re-estimation across many leave-group-out folds. Time complexity reduces to O(nc)O(n \cdot c), with c1c\ll1 due to factor reuse (Adin et al., 2023).

4. LOGO in Structured Model Cross-Validation

Standard leave-one-out cross-validation (LOOCV) may conflate training and test due to latent field correlations, leading to optimistic generalization assessments in structured models. LOGO cross-validation ameliorates this by grouping held-out observations according to inferred correlation structure (Adin et al., 2023). For each index ii:

  • The absolute correlation vector Ci|C_i| is computed from the latent field’s covariance.
  • Groups IiI_i are defined by the mm largest correlations, e.g., Ii={j:corr(ηi,ηj)τi}{i}I_i = \{j: |corr(\eta_i, \eta_j)| \geq \tau_i\} \cup \{i\}, where τi\tau_i is the mm-th largest value.
  • Hold-out prediction is performed by removing yIiy_{I_i}, refitting, and evaluating π(Yi=yiyIi)\pi(Y_i = y_i \mid y_{-I_i}).
  • This automatic LOGO approach provides more realistic extrapolation error estimates, improves selection of models capturing global structure, and yields scientifically sensible rankings in spatial and spatio-temporal data (Adin et al., 2023).

Empirical applications to spatial multivariate counts, spatial compositional data, and geospatial processes demonstrate that LOGO cross-validation can favor models with shared components or more flexible covariance structure, sometimes differing markedly from DIC/WAIC/LOOCV-based selections.

5. Applications in Attribution, Privacy, and Model Selection

LOGO retraining underpins a diverse class of applications:

Domain Purpose LOGO Implementation
Diffusion models Group attribution Retrain/unlearn on DiD_{-i}, ELBO diff
LLMs Unlearning user sequences LOO-ensemble teacher-student
Structured models Cross-validation, model selection Automatic group formation (LGOCV)

In diffusion models, LOGO enables group-wise data attribution, facilitating counterfactual queries of the form: “How would image generation change if all samples of a given class or style were omitted from training?” (Murata et al., 30 Jan 2026). In LLMs, it provides a practical paradigm for precise data removal by enabling per-user or sequence-wise unlearning, while preserving global model utility (Liu et al., 2023).

In spatial statistics and structured regression, LOGO cross-validation (LGOCV) overcomes the limitations of LOO by accounting for the dependence structure, accurately estimating out-of-sample predictive skill (Adin et al., 2023).

6. Limitations and Open Challenges

LOGO retraining, in its exact form, is computationally infeasible for large mm or extremely large datasets. Unlearning-based approximations (e.g., GUDA) are subject to biases owing to finite unlearning steps, approximate distillation, and limited capacity to forget overlapping or multi-membership groups (Murata et al., 30 Jan 2026). For LLMs, the LOO-ensemble technique assumes mutually disjoint data blocks and may not generalize to overlapping user data (Liu et al., 2023).

In structured models, LOGO’s accuracy depends on the appropriateness of automatically constructed groups and on the latent field covariance estimation. Posterior correlations provide greater fidelity than prior-based ones, but incur greater computational cost (Adin et al., 2023).

Further, most current LOGO methodologies rely on the assumption of disjoint groups; mechanisms for handling overlapping or fuzzy group membership remain undeveloped in these frameworks.

7. Comparative Analysis and Empirical Insights

Across applications, LOGO retraining (or its effective approximations) provides direct measurement of causal group influence:

  • In diffusion models, group unlearning (GUDA) using ELBO difference yields higher agreement with the LOGO oracle (Top-1 match: 72.7% on CIFAR-10) and outperforms CLIP similarity and gradient-based approaches, while achieving 100×\sim100\times preprocessing speedup (Murata et al., 30 Jan 2026).
  • In LLM privacy unlearning, the LOGO (LOO ensemble) approach matches retraining-from-scratch in removing canary sequences (0% recovery), while preserving utility (low perplexity) and markedly outperforming PATE and gradient-ascent alternatives (Liu et al., 2023).
  • In structured model selection, automatic LOGO cross-validation produces more realistic generalization error estimates than LOO, often selecting more flexible and scientifically sensible models (Adin et al., 2023).

A plausible implication is that LOGO retraining and its principled approximations are essential components in data attribution, privacy unlearning, and robust generalization diagnostics, particularly where structured data or group-wise interventions are of primary interest.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Leave-One-Group-Out (LOGO) Retraining.