Short-video Propagation Influence Rating: A New Real-world Dataset and A New Large Graph Model

Published 31 Mar 2025 in cs.CV, cs.CL, cs.LG, cs.MM, and cs.SI | (2503.23746v2)

Abstract: Short-video platforms have gained immense popularity, captivating the interest of millions, if not billions, of users globally. Recently, researchers have highlighted the significance of analyzing the propagation of short-videos, which typically involves discovering commercial values, public opinions, user behaviors, etc. This paper proposes a new Short-video Propagation Influence Rating (SPIR) task and aims to promote SPIR from both the dataset and method perspectives. First, we propose a new Cross-platform Short-Video (XS-Video) dataset, which aims to provide a large-scale and real-world short-video propagation network across various platforms to facilitate the research on short-video propagation. Our XS-Video dataset includes 117,720 videos, 381,926 samples, and 535 topics across 5 biggest Chinese platforms, annotated with the propagation influence from level 0 to 9. To the best of our knowledge, this is the first large-scale short-video dataset that contains cross-platform data or provides all of the views, likes, shares, collects, fans, comments, and comment content. Second, we propose a Large Graph Model (LGM) named NetGPT, based on a novel three-stage training mechanism, to bridge heterogeneous graph-structured data with the powerful reasoning ability and knowledge of LLMs. Our NetGPT can comprehend and analyze the short-video propagation graph, enabling it to predict the long-term propagation influence of short-videos. Comprehensive experimental results evaluated by both classification and regression metrics on our XS-Video dataset indicate the superiority of our method for SPIR.

Abstract PDF Upgrade to Chat

Summary

The paper introduces SPIR with the XS-Video dataset and NetGPT model, integrating GNNs and LLMs to predict long-term short-video influence.
It leverages multi-platform data from five major Chinese platforms, capturing rich interactive metrics over a two-week observation period.
Experimental results highlight NetGPT's superior accuracy and reduced error metrics, emphasizing the merit of combining graph structures with language reasoning.

Short-video Propagation Influence Rating: A New Real-world Dataset and A New Large Graph Model

Introduction

The paper presents a novel task titled Short-video Propagation Influence Rating (SPIR) alongside a comprehensive dataset, XS-Video, specifically designed to explore the propagation dynamics of short videos across multiple online platforms. The proliferation of short-video platforms has generated vast networks ripe for analysis, but existing research primarily focuses on simplistic popularity metrics such as views or likes. SPIR aims to predict the long-term influence of newly released short videos using a multi-dimensional approach, encompassing various interactions like shares, collects, and comments. This positions SPIR as a more holistic measure of video impact within digital ecosystems.

Figure 1: Short-video Propagation Influence Rating (SPIR): Predicting the influence level of a newly posted short-video that can be achieved in a long period.

XS-Video Dataset

XS-Video sets itself apart by incorporating data from five major Chinese platforms (Douyin, Kuaishou, Xigua, Toutiao, and Bilibili), offering breadth that is absent in single-platform datasets. It includes 117,720 videos and 381,926 samples, compiled with detailed interactions tracked over two weeks after posting. This nuanced data facilitates the improved annotation of video influence levels from 0 to 9, delivering a richer understanding of the factors that drive video propagation across platforms.

Figure 2: An example of short-video states/samples collected in our XS-Video dataset. The text is translated into English.

The dataset's construction involves daily updates on new videos and interaction metrics, ensuring that annotations reflect a comprehensive view of video influence. The broad coverage of interactions—views, likes, shares, collects, fans, and comments—allows researchers to explore understanding video dynamics.

Figure 3: Brief construction procedure of our XS-Video dataset: (1) Daily update of new short-videos and the interactive information of already collected short-videos; (2) Alignment of multi-dimensional interactive indicators (collected 2 weeks later than the publication of videos) for annotating the video propagation influence levels.

Proposed Model: NetGPT

SPIR's complexity necessitates sophisticated models capable of leveraging large-scale data. The authors introduce NetGPT, a Large Graph Model (LGM) integrating Graph Neural Networks (GNNs) with LLMs like Qwen2-VL. NetGPT employs a three-stage training mechanism: heterogeneous graph pretraining, supervised language fine-tuning, and task-oriented predictor fine-tuning. This design enables NetGPT to bridge the gap between graph data's structural nuances and LLMs' reasoning capabilities.

Figure 4: Framework of our proposed NetGPT model: (1) Pretrain a heterogeneous GNN to obtain the features of the video nodes; (2) Train a graph projector to bridge GNN feature space and the LLM embedding space by supervised instruction fine-tuning; (3) Fine-tuning the model with an additional predictor to obtain the final influence level of the short-videos.

Experimental Results

Experiments conducted on XS-Video exhibit NetGPT's superiority over current approaches. The model significantly outperforms GNNs, LLMs, and multimodal LLMs in SPIR tasks by effectively capturing complex interactions within video propagation graphs (Table 1). NetGPT's improved accuracy and reduced error metrics underscore the importance of integrating graph-structured data with LLMs for nuanced video influence analysis.

(Table 1)

Table 1: The results of Short-video Propagation Influence Rating (SPIR) on the XS-Video dataset. $\uparrow$ denotes the higher the better and $\downarrow$ denotes the lower the better.

Furthermore, ablation studies suggest that adding video content features and maintaining graph structure integrity greatly enhance model predictions. Evaluating predictions across different observation periods reveals NetGPT's consistent performance improvement over longer durations.

Figure 5: Results of long-, median-, and short-term prediction with the observation times of $\leq 3$ days, $\leq 7$ days, and $>7$ days.

Conclusion

The introduction of XS-Video alongside the SPIR task represents a significant step in understanding short-video dynamics. NetGPT, by combining GNNs and LLMs, showcases how large-scale data and LLMs can collaboratively enhance predictions of video influence in complex propagation networks. This work sets the stage for further exploration into cross-platform video analysis and its implications in sectors such as advertising, content recommendation, and social network dynamics. The open availability of the dataset and code will facilitate widespread engagement with these findings and encourage continued research in this domain.

Markdown Report Issue