Effect of sum vs. mean reduction in supervised contrastive loss with variable instance sizes
Determine whether using the "sum" reduction in the supervised contrastive loss captures variation in instance sizes across batches and thereby explains the observed stability and performance gains relative to the "mean" reduction when training the multilingual Siamese two-tower embedding retriever on three-level e-commerce query–item relevance labels.
References
Additionally, we observed consistent findings, in line with that reduction = sum has more stable convergence and better performance than reduction = mean and our conjecture is that reduction = sum captures the varied instance sizes across batches.
— Mine and Refine: Optimizing Graded Relevance in E-commerce Search Retrieval
(2602.17654 - Xi et al., 19 Feb 2026) in Section 4.3 (Training: SupCon)