Trans-LoRA: Efficient Adapter Transfer

Updated 20 January 2026

Trans-LoRA is a parameter-efficient fine-tuning method that uses synthetic data and distillation to transfer LoRA adapters across different base models.
It constructs a filtered synthetic dataset that mimics the original training distribution, enabling effective adapter migration and preserving task accuracy.
Empirical evaluations reveal that Trans-LoRA achieves lossless or improved performance across diverse tasks and architectures even in restricted data settings.

Trans-LoRA is a parameter-efficient fine-tuning (PEFT) transfer method enabling lossless or positive transfer of low-rank adapters (LoRA) across distinct base models without requiring access to proprietary client data. The framework circumvents the central limitation of classical LoRA—its strict coupling to the pre-trained base weights—by leveraging synthetic data generation and distillation. Trans-LoRA enables adapters trained on one base model to be migrated to new base models, or even across different PEFT classes, preserving accuracy on downstream tasks even in cloud environments where the original training data is inaccessible (Wang et al., 2024).

1. Motivation and Problem Setting

PEFT methods such as LoRA attach a set of low-rank adapter weights to a fixed pre-trained model. On model deprecation or replacement (for instance, upgrading from Llama-2-7B to Llama-2-13B or changing to a different architecture like Gemma), all client-specific adapters must be re-trained on the original data—a process often infeasible for privacy, scalability, or legal reasons. Because LoRA's weight updates $\Delta W = AB^{\top}$ are strongly bound to the exact anchor weights $W_0$ , naive transplantation into a new base model degrades performance or fails to capture the intended downstream behavior (Wang et al., 2024).

Trans-LoRA addresses this by constructing a filtered synthetic dataset, closely mimicking the data distribution that the original adapters experienced, thus sidestepping the need for access to actual user data.

2. LoRA Adapter Recap

LoRA (Low-Rank Adaptation) replaces a full-rank parameter update with a low-rank factorization: $W = W_0 + \Delta W, \qquad \Delta W = AB^{\top}$ where $A, B \in \mathbb{R}^{d \times r}$ for $r \ll d$ . During traditional fine-tuning, the adapter weights $\theta_s = \{A, B\}$ are optimized on the actual task dataset $D$ using a loss function such as: $\mathcal{L}_{\text{LoRA}} = \mathbb{E}_{(x, y) \sim D}\left[\ell\left(f(x; W_0 + AB^{\top}), y\right)\right]$ At inference, only the compact low-rank update is applied to the frozen base model parameters, allowing rapid deployment and storage efficiency (Wang et al., 2024).

3. Synthetic Data Generation and Filtering

The core challenge is approximating the marginal data distribution $P(x)$ of the (unavailable) original training set $D$ using only limited accessible information. Trans-LoRA addresses this with a two-stage synthetic data pipeline:

3.1 Synthetic Data Generation via In-Context LLM Synthesis

An instruction-tuned LLM (commonly the target model $M_t$ , or any suitably aligned open-source model) is selected as the generator. A small set of $k=5$ public or permissible seed examples $\{(x_j, y_j)\}$ demonstrates I/O format and task style, serving only illustrative purposes. Using a prompt patterned as:

1 2	Here are 5 examples of the task (prompts and correct completions). Now generate 1 new example following the same format:

the generator produces synthetic prompt–completion pairs, iterated until a synthetic pool

D_{\text{syn}}

of desired cardinality (typically

|D_{\text{syn}}| \geq |D|

) is constructed (Wang et al., 2024).

3.2 Discriminative Filtering

To ensure that the synthetic distribution matches the relevant subspace of the original real dataset, a lightweight PEFT discriminator $\phi$ is trained concurrently with the original LoRA adapters. The discriminator distinguishes between real and synthetic $x$ by optimizing: $\phi^* = \arg\max_\phi \mathbb{E}_{x \sim D}\left[\log p_\phi(\texttt{yes} \mid x)\right] + \mathbb{E}_{x \sim M_s}\left[\log p_\phi(\texttt{no} \mid x)\right]$ At transfer, the discriminator filters $D_{\text{syn}}$ to obtain $D_{\text{filt}}$ , comprising only synthetic examples judged sufficiently similar to real training data by exceeding a confidence threshold (Wang et al., 2024).

Stage	Input/Output	Purpose
LLM In-Context Generation	Seeds $\to$ $D_{\text{syn}}$	Create a large synthetic dataset in the required task format
Discriminative Filtering	$D_{\text{syn}}$ $\to$ $D_{\text{filt}}$	Select synthetic samples resembling true training data

4. Distillation-Based Transfer Algorithm

Given the source model $M_s$ with trained adapters $\theta_s = \{A_s, B_s\}$ and a target base model $M_t$ , Trans-LoRA learns new adapters $\theta_t$ for $M_t$ by distilling knowledge via: $\mathcal{L}_{\text{distill}}(\theta_t) = \mathbb{E}_{x \sim D_{\text{filt}}} \left[\ell_{\text{CE}}\left(f(x; W_0^t + A_tB_t^\top),\ f(x; W_0^s + A_sB_s^\top)\right)\right]$ Here, $(M_s, \theta_s)$ acts as teacher and $(M_t, \theta_t)$ as student. The process is iterative standard gradient descent over $D_{\text{filt}}$ , with initialization of $\theta_t$ at random and no extra regularization beyond lightweight weight decay (usually set to zero).

The complete transfer loop pseudocode is as follows:

Input:   M_s, θ_s, M_t, φ, seeds, N_syn
1. D_filt = SYNTH_FILTER(M_t, seeds, φ, N_syn)
2. initialize θ_t (A_t,B_t) at random
3. while not converged:
       sample batch B ⊂ D_filt
       L ← CE(M_t(x;θ_t), M_s(x;θ_s))  # ∀ x∈B
       θ_t ← θ_t − η ∇_θ_t L
4. return θ_t

(Wang et al., 2024)

5. Empirical Results and Ablations

Trans-LoRA evaluations consider Llama and Gemma model families, including cross-family and cross-size transfers, and multiple PEFT variants. Benchmarks include BBH (27 tasks), MMLU (57 subjects), MBPP/MBPP+ (code tasks), and GSM8K (math).

5.1 Main Transfer Results

Representative transfer results (average task accuracy):

Source → Target → Disc	Source LoRA	Target no LoRA	Trans-LoRA
Llama2-7B → Llama2-13B → Llama2-7B	43.32%	37.85%	43.41%
Gemma2B → Gemma7B → Gemma2B	31.84%	37.75%	43.61%
Llama2-7B → Gemma7B → Gemma2B	43.32%	37.75%	45.41%

MMLU (57 tasks):

Source → Target → Disc	Source LoRA	Target no LoRA	Trans-LoRA
Llama2-7B → Llama2-13B → Llama2-7B	45.89%	53.72%	55.09%
Gemma2B → Gemma7B → Gemma2B	42.34%	60.45%	61.23%

Comparable or improved transfer holds for MBPP/MBPP+ and GSM8K. Across nearly 90 diverse tasks, Trans-LoRA enables lossless or enhanced transfer, even when jumping across pre-training regimes or PEFT methods (Wang et al., 2024).

5.2 Ablation Studies

Distillation Data Choice: Filtering synthetic samples with the discriminator gives superior transfer (BBH: 43.41%) compared to Wikipedia text (37.3%), unfiltered synthetic (41.95%), or seed-only (39.82%).
PEFT Method Transfer: Transfers between LoRA, DoRA, and Prompt-Tuning on Gemma 2B→7B remain effective (40–44% BBH).
Multi-Hop Transfer: Chaining transfers (e.g., 7B→13B→Gemma-7B) yields no material degradation (ending at 45.04% vs. 43.32% source on BBH).
Synthetic Dataset Size: Performance increases smoothly with $|D_{\text{filt}}|$ for fixed number of updates.

6. Limitations and Future Prospects

Trans-LoRA incurs additional, but modest, compute for synthetic data generation and filtering. Direct (dataless) adapter mapping remains an open target for future research. In specific high-complexity or ambiguous domains (e.g., Disambiguation-QA), synthetic generation may yield invalid samples, mitigated by increasing seed count ( $k \rightarrow 15$ ) or tailored prompt engineering. Reliance on base LLM alignment can propagate generator hallucination or distributional drift; adversarial filtering or advanced synthetic data generation strategies are plausible future directions (Wang et al., 2024).

7. Relation to Alternative Data-Free Transfer Methods

Alternative frameworks such as Cross-LoRA (Xia et al., 7 Aug 2025) provide entirely data-free, training-free adapter transfer by analytical subspace alignment via truncated SVD and Frobenius-optimal projections. In contrast, Trans-LoRA's reliance on synthetic data and distillation enables transfer across broader architectural and methodological boundaries, including cross-family and cross-PEFT settings. Empirical results indicate that in scenarios where direct subspace alignment is impractical, Trans-LoRA delivers robust, scalable solution for adapter migration in proprietary or privacy-critical environments.

Markdown Report Issue Upgrade to Chat

References (2)

$\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning (2024)

Cross-LoRA: A Data-Free LoRA Transfer Framework across Heterogeneous LLMs (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Trans-LoRA.