Ex Machina: Personal Attacks Seen at Scale

Published 27 Oct 2016 in cs.CL | (1610.08914v2)

Abstract: The damage personal attacks cause to online discourse motivates many platforms to try to curb the phenomenon. However, understanding the prevalence and impact of personal attacks in online platforms at scale remains surprisingly difficult. The contribution of this paper is to develop and illustrate a method that combines crowdsourcing and machine learning to analyze personal attacks at scale. We show an evaluation method for a classifier in terms of the aggregated number of crowd-workers it can approximate. We apply our methodology to English Wikipedia, generating a corpus of over 100k high quality human-labeled comments and 63M machine-labeled ones from a classifier that is as good as the aggregate of 3 crowd-workers, as measured by the area under the ROC curve and Spearman correlation. Using this corpus of machine-labeled scores, our methodology allows us to explore some of the open questions about the nature of online personal attacks. This reveals that the majority of personal attacks on Wikipedia are not the result of a few malicious users, nor primarily the consequence of allowing anonymous contributions from unregistered users.

Abstract PDF Upgrade to Chat

Citations (728)

View on Semantic Scholar

Summary

The paper introduces a hybrid methodology combining crowdsourced annotations and machine learning classifiers to detect personal attacks at scale.
It validates the approach on over 63 million Wikipedia comments, achieving an AUC of 97.19% and revealing that anonymous users are six times more likely to attack.
The analysis highlights attack clustering in time and concentration among a few high-toxicity users, with only 12.2% of attacks resulting in moderator intervention.

Ex Machina: Personal Attacks Seen at Scale

The paper "Ex Machina: Personal Attacks Seen at Scale" addresses the critical problem of identifying and analyzing personal attacks within large-scale online discussion platforms. The authors synergize crowdsourcing and machine learning approaches to develop a scalable and effective methodology for detecting personal attacks, focusing on English Wikipedia as a case study.

Motivation and Methodology

Personal attacks undermine the quality of online discourse and community health. Despite this, studying personal attacks at scale is difficult due to the vast number of comments and the nuanced nature of what constitutes an attack. This paper contributes by developing a hybrid methodology leveraging crowdsourcing to create a high-quality dataset and machine learning to classify large corpora of comments.

The authors began by crowdsourcing the labeling of a subset of Wikipedia comments. They collected annotations from multiple Crowdflower workers for every comment, ensuring reliability through inter-annotator agreement. From this annotated dataset, they trained and evaluated a machine learning classifier. The classifier used logistic regression (LR) and multi-layer perceptrons (MLP) with both word and character n-gram features. They explored two label synthesis methods: one-hot (OH) labels and empirical distribution (ED) labels, finding that ED labels yielded better performance.

Evaluation and Results

The classifier's performance was rigorously validated via a human baseline comparison, where its predictions were juxtaposed against an ensemble of human annotations. The classifier performance approximated the consensus of three annotators when evaluated using Area Under the ROC Curve (AUC) and Spearman correlation metrics, achieving an AUC of 97.19% and a Spearman correlation of 66.02%.

The model was subsequently applied to the entire corpus of Wikipedia discussion comments, labeling over 63 million comments. The authors ensured the quality of automated annotations by setting an appropriate threshold based on precision and recall metrics, maintaining a balance between false positives and false negatives.

Analysis Using Machine-Labeled Data

Using the machine-labeled corpus, the authors investigated several aspects of personal attacks on Wikipedia:

Impact of Anonymity: It was found that while anonymous users contributed approximately 9.6% of the comments, their comments were six times more likely to be attacks compared to registered users. Nonetheless, anonymous contributions accounted for less than half of the total attacks.
Correlation with Contribution Quantity: Frequent contributors generally made fewer personal attacks. However, users at both extremes (very low and very high activity) were responsible for a significant proportion of attacks.
Concentration of Attacks: The study revealed that most attacks were diffused across many users with low toxicity levels. However, a small number of users with a high toxicity level accounted for a disproportionately large number of attacks, indicating that targeted moderation could be effective.
Moderation Effectiveness: Only about 12.2% of the expected true attacks resulted in moderator action, implying substantial room for improvement in moderation practices.
Timing and Clustering of Attacks: Attacks were found to cluster in time, with the likelihood of subsequent attacks increasing significantly following an initial attack. This suggests the potential utility of proactive moderation.

Implications and Future Directions

The paper's methodology and findings have significant implications. Practically, the demonstrated model could aid in real-time detection of personal attacks, enabling more effective and timely moderation. Theoretically, the results contribute to an improved understanding of the dynamics and patterns of abusive behavior online.

Future research could explore intervention strategies to minimize the occurrence and impact of personal attacks. Investigating the potential causal factors and triggers for attacks could also yield further insights.

Conclusion

In summary, this paper presents a robust and scalable methodology for detecting personal attacks in online discussions, validated through comprehensive evaluation and applied to a large-scale dataset. The findings offer valuable insights into the prevalence and nature of personal attacks, informing both community policies and moderation practices. The authors have also generously provided open access to their datasets and classifier, paving the way for further research and advancements in the field.

Markdown Report Issue