Training for Token-Utility Maximization in Late Interaction Retrieval

Determine whether explicitly training ColBERT-style MaxSim late-interaction retrieval models with objectives that maximize the utility of each document token in multi-vector representations leads to improved retrieval performance across multimodal datasets.

Background

The paper analyzes how often document tokens participate in MaxSim matches during late interaction and observes that, in multimodal settings, only about 1% of tokens are typically active in a single evaluation pass. This underutilization suggests significant redundancy and motivates developing methods that encourage broader token participation.

Empirical analysis on MSR-VTT shows strong correlations between retrieval performance and evenness measures (e.g., Coefficient of Variation and Gini) of the distribution of maximum similarity matches across document positions. Based on these correlations, the authors hypothesize that training objectives which explicitly increase token-level utilization could improve retrieval effectiveness, but they do not test such objectives here due to scope limitations.

References

These results suggest that training late interaction methods to maximize the utility of each token in its document representations will lead to strong performance, which we leave for future work to explore.

Multi-Vector Index Compression in Any Modality  (2602.21202 - Qin et al., 24 Feb 2026) in Section 6 (Experiments), Subsection "Index Utilization" — paragraph "Predicting Performance with Utilization"