Determine the optimal LoRA scaling factor α relative to rank
Determine the optimal choice of the LoRA scaling factor α as a function of the adapter rank r for low‑rank adaptation in decoder‑only large language model fine‑tuning, clarifying whether α should be constant across ranks, proportional to r (e.g., α = r or α = 2r), proportional to √r, or follow another scaling rule that yields the best training stability and downstream performance.
References
Kalajdzievski (2023) argued that & should scale with the square root of r, rather than linearly (a x r), though the optimal a setup remains unclear.
— Learning Rate Matters: Vanilla LoRA May Suffice for LLM Fine-tuning
(2602.04998 - Lee et al., 4 Feb 2026) in Appendix C, On LoRA Scaling Factor