Forecasting of Events by Tweet Data Mining

Published 13 Oct 2013 in cs.SI, cs.CL, and cs.CY | (1310.3499v1)

Abstract: This paper describes the analysis of quantitative characteristics of frequent sets and association rules in the posts of Twitter microblogs related to different event discussions. For the analysis, we used a theory of frequent sets, association rules and a theory of formal concept analysis. We revealed the frequent sets and association rules which characterize the semantic relations between the concepts of analyzed subjects. The support of some frequent sets reaches its global maximum before the expected event but with some time delay. Such frequent sets may be considered as predictive markers that characterize the significance of expected events for blogosphere users. We showed that the time dynamics of confidence in some revealed association rules can also have predictive characteristics. Exceeding a certain threshold may be a signal for corresponding reaction in the society within the time interval between the maximum and the probable coming of an event. In this paper, we considered two types of events: the Olympic tennis tournament final in London, 2012 and the prediction of Eurovision 2013 winner.

Abstract PDF Upgrade to Chat

Citations (11)

View on Semantic Scholar

Summary

The paper explores using tweet data mining techniques like frequent sets and formal concept analysis to identify semantic relationships for event forecasting.
Case studies demonstrate the method's effectiveness, correctly predicting the 2012 Olympic women's tennis final winner and the 2013 Eurovision Song Contest winner.
The research offers practical applications in predictive analytics for industries such as marketing, politics, and finance, with potential for future integration of sentiment analysis.

Overview of "Forecasting of Events by Tweet Data Mining"

The paper "Forecasting of Events by Tweet Data Mining" by Bohdan Pavlyshenko explores the potential of using Twitter microblogs for event forecasting, employing methodologies rooted in data mining techniques such as frequent sets, association rules, and formal concept analysis. By scrutinizing tweets, which are characterized by their brevity and high density of contextually relevant keywords, this research uncovers the semantic relationships and predictive markers associated with significant events.

Methodology and Analysis

The foundation of the analysis relies on the use of frequent sets and association rules to identify and predict key events. The approach involves constructing a model that transforms tweets into transactions composed of key terms, upon which frequent sets are established using the Apriori algorithm. By determining the support and confidence of these association rules, the paper elucidates potential semantic connections and predictive indicators.

Formal concept analysis further enriches this investigation by representing semantic relations within a concept lattice. The semantic lattice is visualized using a Hasse diagram, allowing for the examination of extent and intent, which reflect the significant semantic constituents and their associated tweets. The concepts with maximum extent, according to the paper, are posited as the most accurate reflections of real-world events.

Case Studies

Two case studies are provided to validate the model: the 2012 Olympic tennis finals and the 2013 Eurovision Song Contest. For the Olympic Games, the analysis identified correctly through the extent values that Williams won the women's final on August 4, 2012. This outcome demonstrates the efficacy of the model in capturing both the anticipated and actual results.

In the Eurovision case, the prediction was performed a day before the event by analyzing 2,400 tweets. The constructed association rules highlighted Denmark as the leading candidate, which aligns with the actual event outcome, where Denmark won the contest.

Implications and Future Directions

This research posits significant applications for both theoretical and practical domains of predictive analytics. From a theoretical standpoint, it advances the understanding of how semantic relationships can be extracted from microblogging datasets to anticipate event outcomes. On the practical front, it showcases a viable approach to sentiment and event forecasting that could benefit industries such as marketing, politics, and finance.

Moving forward, the approach could be expanded to include more complex models that incorporate sentiment analysis or machine learning to further refine predictive capabilities. The exploration of real-time data mining techniques and enhancements in natural language processing could also augment the accuracy and applicability of these findings across broader domains.

In summary, the paper demonstrates that Twitter data, when analyzed through established data mining frameworks, can serve as a plausible source for event forecasting. The case studies underpin this assertion, illustrating how semantic lattice models can effectively map expectations and outcomes. As social media consumption continues to proliferate, such methods offer a promising avenue for timely and accurate event prediction.

Markdown Report Issue