NineRec: A Benchmark Dataset Suite for Evaluating Transferable Recommendation

Published 14 Sep 2023 in cs.IR | (2309.07705v3)

Abstract: Large foundational models, through upstream pre-training and downstream fine-tuning, have achieved immense success in the broad AI community due to improved model performance and significant reductions in repetitive engineering. By contrast, the transferable one-for-all models in the recommender system field, referred to as TransRec, have made limited progress. The development of TransRec has encountered multiple challenges, among which the lack of large-scale, high-quality transfer learning recommendation dataset and benchmark suites is one of the biggest obstacles. To this end, we introduce NineRec, a TransRec dataset suite that comprises a large-scale source domain recommendation dataset and nine diverse target domain recommendation datasets. Each item in NineRec is accompanied by a descriptive text and a high-resolution cover image. Leveraging NineRec, we enable the implementation of TransRec models by learning from raw multimodal features instead of relying solely on pre-extracted off-the-shelf features. Finally, we present robust TransRec benchmark results with several classical network architectures, providing valuable insights into the field. To facilitate further research, we will release our code, datasets, benchmarks, and leaderboards at https://github.com/westlake-repl/NineRec.

Abstract PDF HTML Upgrade to Chat

References (69)

Citations (3)

View on Semantic Scholar

Summary

The paper presents a novel benchmark dataset suite, NineRec, that enhances evaluation and training of transferable recommendation models with multimodal features.
It reports significant pre-tuning benefits, as models exhibit marked improvements on target datasets and reduce cold-start challenges.
The study signals a shift from classic ID-based methods to modality-driven approaches, paving the way for universal recommendation systems.

NineRec: Benchmark Dataset Suite for Transferable Recommendation

The paper "NineRec: A Benchmark Dataset Suite for Evaluating Transferable Recommendation" addresses a significant challenge in the field of recommender systems: the limited progress of transferable recommendation models, or "TransRec." These models are aimed at the development of one-for-all recommendation systems that leverage learning from one domain to predict in others. In stark contrast to the generalized success of foundational models in other AI fields, TransRec has lagged, hindered by the lack of large-scale benchmark datasets and the dominance of the ID-based recommendation paradigm.

Dataset Development

NineRec is introduced as a comprehensive dataset suite designed to enhance research on transferable recommendations by overcoming the dataset limitation. The suite comprises a substantial source domain dataset and nine diverse target domain datasets. This approach provides a benchmark for analyzing various TransRec models.

Source Dataset (Bili_2M): It contains millions of user-item interactions and is rich in multimodal features, each item being represented by high-resolution images and descriptive text.
Target Datasets: These include datasets collected from different vertical channels on a single platform and cross-platform datasets, ensuring a broad scope for evaluating transferability across diverse domains.

NineRec is a pioneering contribution for TransRec models as it emphasizes learning from raw appearance features (images and text), whereas historical datasets generally involved static, pre-extracted features. Key distinctions of the dataset include its high semantic complexity and its potential for studies on modality-based recommendations.

Implications of TransRec and Findings

TransRec paradigms fundamentally diverge from classical IDRec. By focusing on item modality features, TransRec can naturally achieve cross-domain capabilities, a leap towards universal recommendation models paralleling those in NLP and CV. The paper empirically demonstrates that models pre-trained on NineRec achieve notable improvements when fine-tuned on target datasets.

Pre-Tuning Effectiveness: Models training on NineRec source data show significant performance improvements over direct training on target datasets, particularly in text-based scenarios.
Cold Start Reduction: TransRec notably reduces cold-start challenges. Even without overlapping IDs, NineRec-trained models adapt effectively across datasets.
Comparison with ID-based Models: While traditionally dominated by IDRec, the TransRec performance surpasses IDRec even in non-cold-start settings when textual data is used, suggesting a pivotal shift.

Challenges and Future Directions

Despite its promise, building universal TransRec models involves challenges such as aligning and effectively fusing different modalities, ensuring scale, and addressing the inherently high computational costs associated with end-to-end training of large modality encoders. Moreover, while NineRec contributes substantially, a broader set of datasets and potentially larger-scale pre-training could further empower emergent capabilities—which remains a speculative but exciting avenue.

Conclusion

NineRec is a substantial advancement for TransRec research, recognized for its real-world applicability and ability to foster developments in recommendation systems akin to foundational models in NLP and CV. By facilitating pre-training on diverse multimodal data, NineRec provides a foundation for developing adaptable, one-for-all recommendation models, encouraging the exploration of cross-domain and cross-platform recommendations. As the field progresses, collaborations across NLP, CV, and recommendation researchers leveraging NineRec can catalyze developments in creating more robust, generalized recommendation systems.

Overall, this paper not only introduces a highly valuable benchmark suite but also empirically demonstrates its potential, setting the stage for future advancements in transferable recommendation models.