Test generalization of SmartSearch to other domains and languages

Determine whether the SmartSearch deterministic retrieval and ranking pipeline for conversational memory retrieval generalizes beyond English conversational memory tasks to other domains such as document search and code retrieval, as well as to other languages, and rigorously evaluate performance and required adaptations in these settings.

Background

The paper evaluates SmartSearch on two English conversational memory benchmarks (LoCoMo-10 and LongMemEval-S) and discusses potential limitations that may favor the approach, including synthetic or semi-structured conversations and high named-entity density. The authors acknowledge that these conditions may not reflect other domains or languages.

Within the threats to validity, the authors explicitly note that the generalization of their method to other types of retrieval tasks (e.g., document search, code retrieval) or to non-English languages has not been tested. Establishing whether SmartSearch maintains its performance and efficiency in these broader contexts remains an unresolved question.

References

Generalization to other domains (e.g., document search, code retrieval) or languages remains untested.

SmartSearch: How Ranking Beats Structure for Conversational Memory Retrieval  (2603.15599 - Derehag et al., 16 Mar 2026) in Threats to Validity – Benchmark Limitations