- The paper introduces a novel wikiHow pretraining method that improves intent detection in dialogue systems.
- It employs transformer-based models on multilingual benchmarks, achieving near state-of-the-art accuracy in both standard and zero-shot settings.
- The study underscores the potential of instructional data for generalizing models across diverse and low-resource domains.
Overview
The paper "Intent Detection with WikiHow" (2009.05781) explores a novel approach for enhancing intent detection in task-oriented dialogue systems by pretraining models on data derived from the instructional content of the wikiHow website. The focus is on improving the adaptability of intent detection models across new domains and languages by leveraging the wide-ranging instructional steps in wikiHow, which are available in multiple languages. By using the proposed pretraining methods, the authors achieve state-of-the-art results on several benchmark datasets, highlighting the potential of wikiHow as a data source for intent detection tasks.
WikiHow Pretraining Methodology
The authors propose a pretraining task that involves creating a dataset where each wikiHow article's title is considered a goal or intent, and its instructional steps are treated as associated utterances. The pretraining task is formulated as a multiple-choice problem in which the model predicts the correct goal for a given step from a set of candidate goals. This approach allows models to benefit from the diversity and domain range of wikiHow articles, making them more generalizable to emerging services and uncommon tasks.
Experimental Setup
In their experiments, the authors fine-tune transformer-based LLMs, specifically RoBERTa for English tasks and XLM-RoBERTa for multilingual tasks. These models, pretrained on the generated wikiHow data, are assessed on major intent detection benchmarks: Snips, Schema-Guided Dialogue (SGD), and Facebook multilingual dialog datasets in English, Spanish, and Thai. The results indicate significant performance improvements in both standard and zero-shot settings compared to baseline models.
Results and Analysis
The models pretrained on wikiHow data demonstrate state-of-the-art performance on intent detection benchmarks, achieving close to 100% accuracy where applicable. In scenarios with little or no in-domain training data, such as zero-shot conditions, the pretrained models still maintain notable accuracy levels, suggesting that the pretraining effectively contributes to the models' generalization capabilities. Through error analysis, the authors identify that misclassifications are often due to improper labeling or ambiguous intents rather than deficiencies in the model's understanding, suggesting that existing benchmarks may need enhancement to challenge modern models effectively.
Implications and Future Work
The research suggests that utilizing a diverse and well-structured dataset like wikiHow for pretraining offers significant advantages for intent detection, especially in rapidly evolving and multilingual environments. Given the models' high performance on existing benchmarks, the authors propose a shift towards more open-domain intent detection research that can better evaluate models across a broader range of intents and user scenarios. The work implies that future development in this field could focus on creating datasets with a vast array of intents from various domains, potentially using automated augmentation techniques to generate comprehensive benchmark data.
Conclusion
The paper highlights how incorporating wikiHow-based pretraining significantly enhances the performance of intent detection models across several benchmarks and languages, underlining the potential of instructional websites as valuable resources for data augmentation in natural language processing tasks. This approach not only advances the state-of-the-art in intent detection but also opens new avenues for tackling intent detection challenges in open-domain and low-resource contexts.