Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance
Abstract: We investigate the performance of LLM-based zero-shot stance detection on tweets. Using FlanT5-XXL, an instruction-tuned open-source LLM, with the SemEval 2016 Tasks 6A, 6B, and P-Stance datasets, we study the performance and its variations under different prompts and decoding strategies, as well as the potential biases of the model. We show that the zero-shot approach can match or outperform state-of-the-art benchmarks, including fine-tuned models. We provide various insights into its performance including the sensitivity to instructions and prompts, the decoding strategies, the perplexity of the prompts, and to negations and oppositions present in prompts. Finally, we ensure that the LLM has not been trained on test datasets, and identify a positivity bias which may partially explain the performance differences across decoding strategie
- 2023. The flan collection: Advancing open source methods for instruction tuning.
- Can we trust the evaluation on ChatGPT? In Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), pages 47–54, Toronto, Canada. Association for Computational Linguistics.
- Abeer AlDayel and Walid Magdy. 2021. Stance detection on social media: State of the art and trends. Information Processing & Management, 58(4):102597.
- Emily Allaway and Kathleen McKeown. 2020. Zero-shot stance detection: A dataset and model using generalized topic representations. arXiv preprint arXiv:2010.03640.
- Adversarial learning for zero-shot stance detection on social media. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4756–4767, Online. Association for Computational Linguistics.
- A systematic review of machine learning techniques for stance detection and its applications. Neural Computing and Applications, pages 1–32.
- Stance detection with bidirectional conditional encoding. arXiv preprint arXiv:1606.05464.
- Brooke Auxier and Monica Anderson. 2021. Social media use in 2021. Pew Research Center, 1:1–4.
- Tweeteval: Unified benchmark and comparative evaluation for tweet classification. arXiv preprint arXiv:2010.12421.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Instructeval: Towards holistic evaluation of instruction-tuned large language models. arXiv preprint arXiv:2306.04757.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Iain J Cruickshank and Lynnette Hui Xian Ng. 2023. Use of large language models for stance classification. arXiv preprint arXiv:2309.13734.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Twitter stance detection—a subjectivity and sentiment polarity inspired two-phase approach. In 2017 IEEE international conference on data mining workshops (ICDMW), pages 365–372. IEEE.
- Human language reveals a universal positivity bias. Proceedings of the national academy of sciences, 112(8):2389–2394.
- Documenting large webtext corpora: A case study on the colossal clean crawled corpus. arXiv preprint arXiv:2104.08758.
- Promptbreeder: Self-referential self-improvement via prompt evolution. arXiv preprint arXiv:2309.16797.
- Tweets2stance: Users stance detection exploiting zero-shot learning algorithms on tweets. arXiv preprint arXiv:2204.10710.
- Datasheets for datasets. Communications of the ACM, 64(12):86–92.
- Stance detection in web and social media: a comparative study. In International Conference of the Cross-Language Evaluation Forum for European Languages, pages 75–87. Springer.
- Demystifying prompts in language models via perplexity estimation. arXiv preprint arXiv:2212.04037.
- Stance classification of tweets using skip char ngrams. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2017, Skopje, Macedonia, September 18–22, 2017, Proceedings, Part III 10, pages 266–278. Springer.
- Infusing knowledge from wikipedia to enhance stance detection. arXiv preprint arXiv:2204.03839.
- Surface form competition: Why the highest probability answer isn’t always right. arXiv preprint arXiv:2104.08315.
- Knowledge-enhanced prompt-tuning for stance detection. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(6):1–20.
- Kornraphop Kawintiranon and Lisa Singh. 2021. Knowledge enhanced masked language model for stance detection. In Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics: human language technologies.
- Unsupervised stance detection for arguments from consequences. In Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 50–60.
- Chatgpt: Jack of all trades, master of none. arXiv preprint arXiv:2302.10724.
- Ivar Krumpal. 2013. Determinants of social desirability bias in sensitive surveys: a literature review. Quality & quantity, 47(4):2025–2047.
- Dilek Küçük and Fazli Can. 2020. Stance detection: A survey. ACM Computing Surveys (CSUR), 53(1):1–37.
- Changmao Li and Jeffrey Flanigan. 2023. Task contamination: Language models may not be few-shot anymore. arXiv preprint arXiv:2312.16337.
- P-stance: A large dataset for stance detection in political domain. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2355–2365.
- Zero-shot stance detection via contrastive learning. In Proceedings of the ACM Web Conference 2022, pages 2738–2747.
- Jointcl: a joint contrastive learning framework for zero-shot stance detection. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 81–91. Association for Computational Linguistics.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586.
- Enhancing zero-shot and few-shot stance detection with commonsense knowledge graph. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3152–3157.
- Politics: pretraining with same-story article comparison for ideology prediction and stance detection. arXiv preprint arXiv:2205.00619.
- Timelms: Diachronic language models from twitter. arXiv preprint arXiv:2202.03829.
- Detecting stance in media on global warming. arXiv preprint arXiv:2010.15149.
- Exploiting sentiment and common sense for zero-shot stance detection. arXiv preprint arXiv:2208.08797.
- Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837.
- A dataset for detecting stance in tweets. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 3945–3952.
- Semeval-2016 task 6: Detecting stance in tweets. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), pages 31–41.
- Stance and sentiment in tweets. ACM Transactions on Internet Technology (TOIT), 17(3):1–23.
- Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
- Pouya Pezeshkpour and Estevam Hruschka. 2023. Large language models sensitivity to the order of options in multiple-choice questions. arXiv preprint arXiv:2308.11483.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
- Dynamic routing between capsules. Advances in neural information processing systems, 30.
- Matthew J Salganik. 2019. Bit by bit: Social research in the digital age. Princeton University Press.
- Timo Schick and Hinrich Schütze. 2020. Exploiting cloze questions for few shot text classification and natural language inference. arXiv preprint arXiv:2001.07676.
- Stance detection benchmark: How robust is your stance detection? KI-Künstliche Intelligenz, 35(3):329–341.
- Jannis Vamvas and Rico Sennrich. 2020. X-stance: A multilingual multi-target dataset for stance detection. arXiv preprint arXiv:2003.08385.
- Teens, social media and technology 2022.
- RJ Wales and R Grieve. 1969. What is so difficult about negation? Perception & Psychophysics, 6:327–332.
- The age of social sensing. Computer, 52(1):36–45.
- Pre-trained language models and their applications. Engineering.
- A survey on opinion mining: From stance to product aspect. IEEE Access, 7:41101–41124.
- What language model architecture and pretraining objective works best for zero-shot generalization? In International Conference on Machine Learning, pages 22964–22984. PMLR.
- Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
- Haoyang Wen and Alexander G Hauptmann. 2023. Zero-shot and few-shot stance detection on varied topics via conditional generation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1491–1499.
- How would stance detection techniques evolve after the launch of chatgpt? arXiv preprint arXiv:2212.14548.
- Investigating chain-of-thought with chatgpt for stance detection on social media. arXiv preprint arXiv:2304.03087.
- C-stance: A large dataset for chinese zero-shot stance detection. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13369–13385.
- Guangzhen Zhao and Peng Yang. 2020. Pretrained embeddings for stance detection with hierarchical capsule network on social media. ACM Transactions on Information Systems (TOIS), 39(1):1–32.
- Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pages 12697–12706. PMLR.
- Knowledge stimulated contrastive prompting for low-resource stance detection. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1168–1178.
- Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910.
- Multilingual stance detection in tweets: The catalonia independence corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1368–1375.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.