Papers
Topics
Authors
Recent
Search
2000 character limit reached

Trans-LoRA: Efficient Adapter Transfer

Updated 20 January 2026
  • Trans-LoRA is a parameter-efficient fine-tuning method that uses synthetic data and distillation to transfer LoRA adapters across different base models.
  • It constructs a filtered synthetic dataset that mimics the original training distribution, enabling effective adapter migration and preserving task accuracy.
  • Empirical evaluations reveal that Trans-LoRA achieves lossless or improved performance across diverse tasks and architectures even in restricted data settings.

Trans-LoRA is a parameter-efficient fine-tuning (PEFT) transfer method enabling lossless or positive transfer of low-rank adapters (LoRA) across distinct base models without requiring access to proprietary client data. The framework circumvents the central limitation of classical LoRA—its strict coupling to the pre-trained base weights—by leveraging synthetic data generation and distillation. Trans-LoRA enables adapters trained on one base model to be migrated to new base models, or even across different PEFT classes, preserving accuracy on downstream tasks even in cloud environments where the original training data is inaccessible (Wang et al., 2024).

1. Motivation and Problem Setting

PEFT methods such as LoRA attach a set of low-rank adapter weights to a fixed pre-trained model. On model deprecation or replacement (for instance, upgrading from Llama-2-7B to Llama-2-13B or changing to a different architecture like Gemma), all client-specific adapters must be re-trained on the original data—a process often infeasible for privacy, scalability, or legal reasons. Because LoRA's weight updates ΔW=AB\Delta W = AB^{\top} are strongly bound to the exact anchor weights W0W_0, naive transplantation into a new base model degrades performance or fails to capture the intended downstream behavior (Wang et al., 2024).

Trans-LoRA addresses this by constructing a filtered synthetic dataset, closely mimicking the data distribution that the original adapters experienced, thus sidestepping the need for access to actual user data.

2. LoRA Adapter Recap

LoRA (Low-Rank Adaptation) replaces a full-rank parameter update with a low-rank factorization: W=W0+ΔW,ΔW=ABW = W_0 + \Delta W, \qquad \Delta W = AB^{\top} where A,BRd×rA, B \in \mathbb{R}^{d \times r} for rdr \ll d. During traditional fine-tuning, the adapter weights θs={A,B}\theta_s = \{A, B\} are optimized on the actual task dataset DD using a loss function such as: LLoRA=E(x,y)D[(f(x;W0+AB),y)]\mathcal{L}_{\text{LoRA}} = \mathbb{E}_{(x, y) \sim D}\left[\ell\left(f(x; W_0 + AB^{\top}), y\right)\right] At inference, only the compact low-rank update is applied to the frozen base model parameters, allowing rapid deployment and storage efficiency (Wang et al., 2024).

3. Synthetic Data Generation and Filtering

The core challenge is approximating the marginal data distribution P(x)P(x) of the (unavailable) original training set DD using only limited accessible information. Trans-LoRA addresses this with a two-stage synthetic data pipeline:

3.1 Synthetic Data Generation via In-Context LLM Synthesis

An instruction-tuned LLM (commonly the target model MtM_t, or any suitably aligned open-source model) is selected as the generator. A small set of k=5k=5 public or permissible seed examples {(xj,yj)}\{(x_j, y_j)\} demonstrates I/O format and task style, serving only illustrative purposes. Using a prompt patterned as:

1
2
Here are 5 examples of the task (prompts and correct completions).
Now generate 1 new example following the same format:
the generator produces synthetic prompt–completion pairs, iterated until a synthetic pool DsynD_{\text{syn}} of desired cardinality (typically DsynD|D_{\text{syn}}| \geq |D|) is constructed (Wang et al., 2024).

3.2 Discriminative Filtering

To ensure that the synthetic distribution matches the relevant subspace of the original real dataset, a lightweight PEFT discriminator ϕ\phi is trained concurrently with the original LoRA adapters. The discriminator distinguishes between real and synthetic xx by optimizing: ϕ=argmaxϕExD[logpϕ(yesx)]+ExMs[logpϕ(nox)]\phi^* = \arg\max_\phi \mathbb{E}_{x \sim D}\left[\log p_\phi(\texttt{yes} \mid x)\right] + \mathbb{E}_{x \sim M_s}\left[\log p_\phi(\texttt{no} \mid x)\right] At transfer, the discriminator filters DsynD_{\text{syn}} to obtain DfiltD_{\text{filt}}, comprising only synthetic examples judged sufficiently similar to real training data by exceeding a confidence threshold (Wang et al., 2024).

Stage Input/Output Purpose
LLM In-Context Generation Seeds \to DsynD_{\text{syn}} Create a large synthetic dataset in the required task format
Discriminative Filtering DsynD_{\text{syn}} \to DfiltD_{\text{filt}} Select synthetic samples resembling true training data

4. Distillation-Based Transfer Algorithm

Given the source model MsM_s with trained adapters θs={As,Bs}\theta_s = \{A_s, B_s\} and a target base model MtM_t, Trans-LoRA learns new adapters θt\theta_t for MtM_t by distilling knowledge via: Ldistill(θt)=ExDfilt[CE(f(x;W0t+AtBt), f(x;W0s+AsBs))]\mathcal{L}_{\text{distill}}(\theta_t) = \mathbb{E}_{x \sim D_{\text{filt}}} \left[\ell_{\text{CE}}\left(f(x; W_0^t + A_tB_t^\top),\ f(x; W_0^s + A_sB_s^\top)\right)\right] Here, (Ms,θs)(M_s, \theta_s) acts as teacher and (Mt,θt)(M_t, \theta_t) as student. The process is iterative standard gradient descent over DfiltD_{\text{filt}}, with initialization of θt\theta_t at random and no extra regularization beyond lightweight weight decay (usually set to zero).

The complete transfer loop pseudocode is as follows:

1
2
3
4
5
6
7
8
Input:   M_s, θ_s, M_t, φ, seeds, N_syn
1. D_filt = SYNTH_FILTER(M_t, seeds, φ, N_syn)
2. initialize θ_t (A_t,B_t) at random
3. while not converged:
       sample batch B ⊂ D_filt
       L ← CE(M_t(x;θ_t), M_s(x;θ_s))  # ∀ x∈B
       θ_t ← θ_t − η ∇_θ_t L
4. return θ_t
(Wang et al., 2024)

5. Empirical Results and Ablations

Trans-LoRA evaluations consider Llama and Gemma model families, including cross-family and cross-size transfers, and multiple PEFT variants. Benchmarks include BBH (27 tasks), MMLU (57 subjects), MBPP/MBPP+ (code tasks), and GSM8K (math).

5.1 Main Transfer Results

Representative transfer results (average task accuracy):

Source → Target → Disc Source LoRA Target no LoRA Trans-LoRA
Llama2-7B → Llama2-13B → Llama2-7B 43.32% 37.85% 43.41%
Gemma2B → Gemma7B → Gemma2B 31.84% 37.75% 43.61%
Llama2-7B → Gemma7B → Gemma2B 43.32% 37.75% 45.41%

MMLU (57 tasks):

Source → Target → Disc Source LoRA Target no LoRA Trans-LoRA
Llama2-7B → Llama2-13B → Llama2-7B 45.89% 53.72% 55.09%
Gemma2B → Gemma7B → Gemma2B 42.34% 60.45% 61.23%

Comparable or improved transfer holds for MBPP/MBPP+ and GSM8K. Across nearly 90 diverse tasks, Trans-LoRA enables lossless or enhanced transfer, even when jumping across pre-training regimes or PEFT methods (Wang et al., 2024).

5.2 Ablation Studies

  • Distillation Data Choice: Filtering synthetic samples with the discriminator gives superior transfer (BBH: 43.41%) compared to Wikipedia text (37.3%), unfiltered synthetic (41.95%), or seed-only (39.82%).
  • PEFT Method Transfer: Transfers between LoRA, DoRA, and Prompt-Tuning on Gemma 2B→7B remain effective (40–44% BBH).
  • Multi-Hop Transfer: Chaining transfers (e.g., 7B→13B→Gemma-7B) yields no material degradation (ending at 45.04% vs. 43.32% source on BBH).
  • Synthetic Dataset Size: Performance increases smoothly with Dfilt|D_{\text{filt}}| for fixed number of updates.

6. Limitations and Future Prospects

Trans-LoRA incurs additional, but modest, compute for synthetic data generation and filtering. Direct (dataless) adapter mapping remains an open target for future research. In specific high-complexity or ambiguous domains (e.g., Disambiguation-QA), synthetic generation may yield invalid samples, mitigated by increasing seed count (k15k \rightarrow 15) or tailored prompt engineering. Reliance on base LLM alignment can propagate generator hallucination or distributional drift; adversarial filtering or advanced synthetic data generation strategies are plausible future directions (Wang et al., 2024).

7. Relation to Alternative Data-Free Transfer Methods

Alternative frameworks such as Cross-LoRA (Xia et al., 7 Aug 2025) provide entirely data-free, training-free adapter transfer by analytical subspace alignment via truncated SVD and Frobenius-optimal projections. In contrast, Trans-LoRA's reliance on synthetic data and distillation enables transfer across broader architectural and methodological boundaries, including cross-family and cross-PEFT settings. Empirical results indicate that in scenarios where direct subspace alignment is impractical, Trans-LoRA delivers robust, scalable solution for adapter migration in proprietary or privacy-critical environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Trans-LoRA.