Content-Based Collaborative Generation for Recommender Systems

Published 27 Mar 2024 in cs.IR | (2403.18480v2)

Abstract: Generative models have emerged as a promising utility to enhance recommender systems. It is essential to model both item content and user-item collaborative interactions in a unified generative framework for better recommendation. Although some existing LLM-based methods contribute to fusing content information and collaborative signals, they fundamentally rely on textual language generation, which is not fully aligned with the recommendation task. How to integrate content knowledge and collaborative interaction signals in a generative framework tailored for item recommendation is still an open research challenge. In this paper, we propose content-based collaborative generation for recommender systems, namely ColaRec. ColaRec is a sequence-to-sequence framework which is tailored for directly generating the recommended item identifier. Precisely, the input sequence comprises data pertaining to the user's interacted items, and the output sequence represents the generative identifier (GID) for the suggested item. To model collaborative signals, the GIDs are constructed from a pretrained collaborative filtering model, and the user is represented as the content aggregation of interacted items. To this end, ColaRec captures both collaborative signals and content information in a unified framework. Then an item indexing task is proposed to conduct the alignment between the content-based semantic space and the interaction-based collaborative space. Besides, a contrastive loss is further introduced to ensure that items with similar collaborative GIDs have similar content representations. To verify the effectiveness of ColaRec, we conduct experiments on four benchmark datasets. Empirical results demonstrate the superior performance of ColaRec.

Abstract PDF HTML Upgrade to Chat

References (61)

Citations (6)

View on Semantic Scholar

Summary

The paper introduces ColaRec, a novel end-to-end framework that integrates collaborative signals and content information using generative models.
The methodology employs generative identifiers (GIDs), T5 integration, and an auxiliary indexing task to align semantic representations effectively.
Evaluation on Amazon datasets demonstrates improved recommendation accuracy, robust performance on sparse data, and potential for broader applicability.

Content-Based Collaborative Generation for Recommender Systems

Introduction

This paper introduces a novel approach to generative recommendation within recommender systems, termed as Conten**t-based Coll**aborative Re**commendation (ColaRec). The primary novelty of the proposal lies in its ability to unify collaborative signals from user-item interactions and item content information into a coherent end-to-end framework. By exploiting generative models, which have been successfully applied in tasks like language generation and image synthesis, ColaRec aims to enhance the performance of recommender systems by leveraging the collective power of content and collaboration.

Model Components

ColaRec's architecture incorporates several key components designed to robustly handle both collaborative and content-based information within a generative recommendation paradigm.

Generative Identifiers (GIDs): Items are assigned GIDs, synthesized from a pretrained collaborative filtering model using a hierarchical clustering approach. This assignment enables capturing both item interactions and content features, addressing traditional shortcomings in aligning collaborative signals with item content.
Figure 1: Comparison between conventional itemIDs and GIDs.
User-Item Interaction Representation: ColaRec models user preferences using unordered tuples of content descriptions of historically interacted items. This nuanced representation enables the capture of complex user-item interactions, enriched by content semantics.
LLM Integration: Utilizing T5, an encoder-decoder based transformer, ColaRec integrates item content into user interaction modeling. This allows for a nuanced understanding of item attributes.
Item Indexing Task: An auxiliary task focuses on mapping item side information into its GID, facilitating better semantic-space alignment. A contrastive loss further ensures that content-space representations are consistent with collaborative signals.
Figure 2: Illustration of collaborative signals and content information.

Algorithm Performance and Evaluation

ColaRec was evaluated on three real-world datasets from Amazon. The superior performance of this method is evident across several key metrics:

Recommendation Accuracy: ColaRec consistently outperformed both traditional CF and advanced generative baselines, achieving notable improvements, especially on user subgroups with sparse data.
Alignment and Integration: Effective alignment between collaborative and content representations within the generative framework led to significant gains over models unable to harmonize these unique data types.
Figure 3: Overview of ColaRec's architecture and dual-task framework.

Key Findings and Implications

Critical to ColaRec's success is its innovative construction of GIDs and the way it aligns content information with user-item interactions. By integrating an auxiliary item indexing task and leveraging a contrastive loss, the model achieves an overview that improves its recommendation capabilities significantly.

Two primary implications emerge from these findings:

Enhanced Generative Modeling: By incorporating collaborative and content-based modeling into a unified generation process, ColaRec delivers a marked improvement over conventional item representation techniques.
Potential for Broader Applicability: Given the demonstrated efficacy across diverse datasets, there is potential for ColaRec to be adapted into various domains beyond the tested benchmarks, especially those where item content plays a crucial role in user engagement.

Impact of Model Design Choices

Length of GIDs: Experiments indicate that a GID length of 3 balances performance and computation efficiently, providing sufficient expression without unnecessary complexity.

Figure 4: Impact of the length of GIDs on performance metrics.

Clustering Parameters: Adjusting the number of clusters impacts performance, with findings suggesting that moderation in clustering yields optimal results due to balance between search space and identification specificity.

Figure 5: Impact of the number of clusters on ColaRec's performance.

Conclusion

ColaRec's development marks a significant step in refining generative recommendation systems by holistically integrating content and collaborative signals. Future advancements may focus on enhancing these integrations with larger LLMs and more sophisticated negative sampling strategies. Furthermore, scalability and computational efficiency remain critical areas for ongoing research, ensuring the viability of such solutions in real-world applications.