Impact of unpaired misspelled query variants on training noise

Establish whether including misspelled query variants without their corresponding clean queries predominantly introduces noise due to lacking well-formed semantic structure during training of the multilingual Siamese two-tower embedding retriever for e-commerce search.

Background

The authors evaluate several strategies for spelling robustness, including regularization between clean and misspelled queries and in-place substitution, which underperform compared to their additive augmentation approach that retains original clean queries.

They conjecture that presenting spelling variants without corresponding clean queries primarily injects noise because such variants lack well-formed semantic structure for the model to learn from.

References

We conjecture that spelling variants presented without their corresponding clean queries introduce predominantly noise, as they lack a well formed semantic structure from which the model can learn.

Mine and Refine: Optimizing Graded Relevance in E-commerce Search Retrieval  (2602.17654 - Xi et al., 19 Feb 2026) in Section 5.3 (Ablation Studies: Spelling Variation Augmentation of Training Queries)