CT-SFT: Efficient Cross-Lingual Fine-Tuning

Updated 20 January 2026

CT-SFT is a method that selectively fine-tunes a sparse set of causally-relevant attention heads ('the circuit') to transfer task competence between languages.
It employs label-balanced baseline estimation and task-directional relevance scoring to update less than 1% of parameters while maintaining model stability.
Empirical results on NusaX-Senti demonstrate enhanced cross-lingual accuracy and reduced catastrophic forgetting compared to full-model fine-tuning.

Circuit-Targeted Supervised Fine-Tuning (CT-SFT) is a parameter-efficient methodology for adaptively transferring task competence between languages in LLMs, particularly under low-resource, cross-lingual regimes. CT-SFT operates by discovering and selectively updating a sparse set of causally-relevant attention heads—termed the "circuit"—derived from a proxy-language checkpoint with established task competence, and then fine-tuning only this circuit (along with all LayerNorm parameters) using a small set of target-language labeled data. This approach enables robust cross-lingual adaptation, sharply limits catastrophic forgetting of the source/proxy-language, and achieves superior accuracy compared with full-model fine-tuning or naive methods, despite modifying less than 1% of the model's parameters (Nur'aini et al., 13 Jan 2026).

1. Conceptual Foundations

CT-SFT was introduced to address central impediments in adapting LLMs to new, low-resource languages: extreme scarcity of labeled data, instability of full-model fine-tuning, and cross-lingual destructive interference (catastrophic forgetting). Building on previous circuit-based analysis frameworks—specifically the Contextual Decomposition Transformer (CD-T)—CT-SFT is distinguished by its counterfactual-free, relevance-driven circuit identification. It does not require interventions or counterfactual inputs for causal discovery; instead, it utilizes a label-balanced mean baseline and directional relevance scoring. The approach is predicated on the hypothesis that source language mechanisms for a given task are transferable across typologically-related languages, provided that the adaptation targets only the most salient computational elements (the "circuit") for the task.

2. Dataset Context and Experimental Setup

Experiments with CT-SFT were performed on NusaX-Senti, a multilingual parallel sentiment analysis corpus covering ten Indonesian local languages and Indonesian itself. Each instance is labeled as negative, neutral, or positive. Five languages were selected for analysis: Indonesian (proxy source) and Acehnese, Buginese, Javanese, Minangkabau (targets). Sentiment classification was cast as autoregressive single-token prediction. Each language split comprised 500 training examples (with a 400-sample "discovery pool" for circuit analysis and a 100-sample "held-out pool" for adaptation), plus 100 validation and 400 test examples. Label-balancing was enforced when forming baseline subsets to minimize bias in circuit discovery (Nur'aini et al., 13 Jan 2026, Winata et al., 2022).

3. Methodological Workflow

CT-SFT involves two principal phases: proxy-language competence tuning and circuit discovery (Phase A), followed by mechanism-guided transfer adaptation (Phase B).

A. Baseline and Relevance Estimation

Label-balanced mean baseline: For each attention head $s$ , the "neutral" activation $\mu(s)$ is computed by averaging over a small, class-balanced subset $M$ (size $\approx$ 50), i.e. $\mu(s) = \mathbb{E}_{x \in M}[a_x(s)]$ where $a_x(s)$ is the pre-activation of $s$ on input $x$ . This baseline helps isolate the task-relevant signal.
Task-directional relevance scoring: For every head, contributions are decomposed into irrelevant ( $\gamma_s = \mu(s)$ ) and relevant ( $\beta_s = a_x(s) - \mu(s)$ ). Relevance scores $R(s)$ are computed at the logit layer as $R(s) = \beta_{s \to \text{logit}}(y_c) - \frac{1}{|Y_{\text{other}}|} \sum_{y \in Y_{\text{other}}} \beta_{s \to \text{logit}}(y)$ , and at intermediate layers via projection onto the task-direction vector in label-unembedding space. This quantifies how much each head promotes the correct sentiment class relative to alternatives.

B. Circuit Selection and Sparsity Control

Circuit selection proceeds via iterative backward expansion:

Depth 0: Score all heads at logit; select top $p=2\%$ by average $R(s)$ . (e.g., 6 heads, $\sim$ 0.23% of params.)
Depth 1+: For each subsequent depth, expand the circuit to include heads that are most directionally relevant to those previously chosen, again at $p=2\%$ sparsity per layer (∼12 heads at depth 1; ∼18 at depth 2).
Termination: Depth 2 suffices for most settings, as deeper circuits yield diminishing returns and increased parameter cost.

C. Surgical Fine-Tuning via Gradient Masking

Once the circuit $C$ is finalized, CT-SFT updates only:

The weights (query, key, value, output projections) of attention heads in $C$ ;
All LayerNorm parameters in the transformer.

All other model components—including MLPs, token embeddings, unembedding matrices, and non-circuit heads—are frozen. Gradient masking enforces this constraint at the head level.

4. Empirical Results and Comparative Analysis

CT-SFT demonstrates significant cross-lingual accuracy improvements on NusaX-Senti relative to direct fine-tuning and continued full-model fine-tuning. Key results (test set, $n=25$ and $n=100$ target samples, circuit depth 2):

Acehnese (hard transfer, baseline 0.471): Direct FT 0.441 / Full FT 0.346 → CT-SFT 0.486 (n=25); Direct FT 0.537 / Full FT 0.428 → CT-SFT 0.547 (n=100)
Buginese: CT-SFT 0.402 vs. Full FT 0.376 (n=25); CT-SFT 0.493 vs. 0.454 (n=100)
Javanese: CT-SFT 0.592 vs. Full FT 0.384 (n=25); CT-SFT 0.636 vs. 0.475 (n=100)
Minangkabau: CT-SFT 0.568 vs. 0.394 (n=25); CT-SFT 0.612 vs. 0.510 (n=100)

CT-SFT also substantially ameliorates catastrophic forgetting of Indonesian (proxy) competence: post-transfer, Indonesian accuracy is retained near the original checkpoint (0.757), whereas continued full FT degrades performance sharply (to 0.341–0.435). This suggests that selective tuning of minimal circuits is critical for stability in cross-lingual transfer.

Editing–Preserving Trade-Off

Transfer difficulty, indexed by baseline zero-shot proxy accuracy $A_0(\ell)$ , modulates the optimal update style:

Hard transfers (lower $A_0$ ): Favor editing the identified circuit heads.
Easier transfers (higher $A_0$ ): Prefer mechanism-preserving updates—restrict changes to near-zero-relevance heads—even as circuit depth increases.

Depth 1 circuits provide a balance between stability and plasticity; depth 2 further aids harder transfers but risks diluting effects on easier pairs.

5. Best Practices and Diagnostic Protocols

Optimal application of CT-SFT is governed by practical recommendations:

Adopt a two-phase protocol: build proxy task competence first, then discover and transfer its circuit.
Label-balanced baseline estimation ( $\approx$ 50 examples per class) is crucial to avoid spurious circuit selection.
Directional (via label-unembedding projection) rather than magnitude-based relevance scores yields more faithful circuits.
Maintain strong sparsity constraints ( $p=2\%$ ) to keep parameter updates minimal ( $<$ 1%).
Use hyperparameters: learning rate $5 \times 10^{-5}$ , 5 epochs, batch size 16, max sequence length 128.
A plausible implication is that for extremely low-sample or new task scenarios, circuit stability may not be guaranteed; sufficient proxy-task skill must precede circuit discovery.

Diagnostics involve mean-ablation protocols on held-out validation sets to verify circuit faithfulness (preservation of accuracy and logit margins).

CT-SFT is directly compared to composable sparse fine-tuning approaches such as DeFT-X, which employ denoising and magnitude-based pruning on fine-tuned weight deltas, followed by retraining under global sparsity constraints. While both approaches selectively update parameters, CT-SFT restricts adaptation to causally relevant attention heads, whereas DeFT-X operates on arbitrary sparse vectors identified via singular value decomposition and thresholding (Simon et al., 21 May 2025). DeFT-X demonstrates robust zero-shot transfer when composing task- and language-specific sparse vectors, and achieves strong performance on NusaX-Senti across multiple low-resource languages.

Unlike full-model, adapter-based [MAD-X] or magnitude-pruned [LT-SFT] methods, CT-SFT's causal interpretability and minimal circuit size yield superior cross-lingual accuracy and retention of original task competence at a fraction of the parameter cost.

7. Limitations and Applicability

CT-SFT's performance depends on the underlying quality of the proxy-language competency checkpoint. In situations where labeled data for source/proxy is insufficient ( $<$ 50 examples), or where task-to-task transfer exhibits high instability, circuit discovery may be unreliable. The transferability of mechanisms is most effective when the proxy and target languages are closely related or when baseline proxy accuracy is moderate to high; otherwise, more extensive circuit editing is necessary. The approach is not a substitute for initial competence-tuning, but a post hoc adaptation and stabilization protocol.

The findings indicate that causally identified subnetwork adaptation—embodied in CT-SFT—offers a rigorous and data-efficient paradigm for robust low-resource cross-lingual transfer, with strict control over destructive interference and stable editing-preserving dynamics (Nur'aini et al., 13 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (3)

Mechanisms are Transferable: Data-Efficient Low-Resource Adaptation via Circuit-Targeted Supervised Fine-Tuning (2026)

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages (2022)

DeFTX: Denoised Sparse Fine-Tuning for Zero-Shot Cross-Lingual Transfer (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Circuit-Targeted Supervised Fine-Tuning (CT-SFT).