Effectiveness of the discrepancy distribution across data volumes

Determine and validate how the effectiveness of the discrepancy distribution Q_diff—constructed in DEFT by subtracting the normalized token-frequency distributions of chosen and rejected responses—varies with the amount of training preference data used to compute it, including analyses of performance and stability under different data volumes.

Background

DEFT introduces a discrepancy distribution, Q_diff, obtained by subtracting the normalized token-frequency distributions of chosen and rejected answers, and a distribution reward to both filter high-quality data and guide training. Experiments show that DEFT enhances alignment and can preserve or improve generalization with reduced training cost.

The authors note a limitation regarding the dependency of Q_diff’s effectiveness on data volume, indicating the need for systematic analysis and validation across different dataset sizes to understand how much data is required for robust distribution guidance.

References

The effectiveness of the discrepancy distribution extracted under different data volumes needs further analysis and validation.

DEFT: Distribution-guided Efficient Fine-Tuning for Human Alignment  (2604.01787 - Zhu et al., 2 Apr 2026) in Section: Limitations