Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark
Abstract: LLMs have been shown to perform well at a variety of syntactic, discourse, and reasoning tasks. While LLMs are increasingly deployed in many forms including conversational agents that interact with humans, we lack a grounded benchmark to measure how well LLMs understand \textit{social} language. Here, we introduce a new theory-driven benchmark, SocKET, that contains 58 NLP tasks testing social knowledge which we group into five categories: humor & sarcasm, offensiveness, sentiment & emotion, and trustworthiness. In tests on the benchmark, we demonstrate that current models attain only moderate performance but reveal significant potential for task transfer among different types and categories of tasks, which were predicted from theory. Through zero-shot evaluations, we show that pretrained models already possess some innate but limited capabilities of social language understanding and training on one category of tasks can improve zero-shot testing on others. Our benchmark provides a systematic way to analyze model performance on an important dimension of language and points to clear room for improvement to build more socially-aware LLMs. The associated resources are released at https://github.com/minjechoi/SOCKET.
- Ralph Adolphs. 2009. The social brain: neural basis of social knowledge. Annual review of psychology, 60:693–716.
- Muppet: Massive multi-task representations with pre-finetuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5799–5811, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- JK Alberts. 1992. Teasing and sexual harassment: Double-bind communication. Constructing and reconstructing gender: The links among communication, language, and gender, 10:185.
- Keith Allan. 2007. The pragmatics of connotation. Journal of Pragmatics, 39(6):1047–1057.
- Whose words hurt? contextual determinants of offensive speech. Personality and Social Psychology Bulletin, 48(6):937–953.
- Ju D Apresjan. 1974. Regular polysemy.
- Hate speech detection is not as easy as you may think: A closer look at model validation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’19, page 45–54, New York, NY, USA. Association for Computing Machinery.
- Salvatore Attardo. 2008. Semantics and Pragmatics of Humor. Language and Linguistics Compass, 2(6):1203–1215.
- SemEval 2018 Task 2: Multilingual Emoji Prediction. In Proceedings of the 12th International Workshop on Semantic Evaluation, pages 24–33, New Orleans, Louisiana. Association for Computational Linguistics.
- Language as context for the perception of emotion. Trends in cognitive sciences, 11(8):327–332.
- Rusty Barrett. 2006. Queer talk. Encyclopedia of Language & Linguistics, 10:316–323.
- SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 54–63, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623.
- Julia Birke and Anoop Sarkar. 2006. A Clustering Approach for Nearly Unsupervised Recognition of Nonliteral Language. In 11th Conference of the European Chapter of the Association for Computational Linguistics, pages 329–336, Trento, Italy. Association for Computational Linguistics.
- Demographic dialectal variation in social media: A case study of African-American English. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1119–1130, Austin, Texas. Association for Computational Linguistics.
- Samuel R Bowman and George E Dahl. 2021. What will it take to fix benchmarking in natural language understanding? In 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, pages 4843–4855. Association for Computational Linguistics (ACL).
- Penelope Brown and Stephen C. Levinson. 1987. Politeness: some universals in language usage. Number 4 in Studies in interactional sociolinguistics. Cambridge University Press, Cambridge [Cambridgeshire] ; New York.
- Politeness: Some universals in language usage, volume 4. Cambridge university press.
- Modeling empathy and distress in reaction to news stories. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018).
- Sven Buechel and Udo Hahn. 2017. EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 578–585, Valencia, Spain. Association for Computational Linguistics.
- Anna Bączkowska. 2021. “You’re too thick to change the station” – Impoliteness, insults and responses to insults on Twitter. Topics in Linguistics, 22(2):62–84.
- Annmarie Cano and Amanda C de C Williams. 2010. Social interaction in pain: Reinforcing pain behaviors or building intimacy? PAIN®, 149(1):9–11.
- Robyn Carston. 2021. Polysemy: Pragmatics and sense conventions. Mind & Language, 36(1):108–133.
- HAHA 2019 dataset: A corpus for humor analysis in Spanish. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5106–5112, Marseille, France. European Language Resources Association.
- Ten social dimensions of conversations and relationships. In Proceedings of The Web Conference 2020, pages 1514–1525.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Scaling instruction-finetuned language models.
- Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555.
- Together Computer. 2023. Redpajama: An open source recipe to reproduce llama training dataset.
- Harald Cramér. 1999. Mathematical methods of statistics, volume 43. Princeton university press.
- CrowdFlower. 2016. The emotion in text, published by crowdflower. https://data.world/crowdflower/sentiment-analysis-in-text. Accessed: 2023-01-14.
- CrowdTruth. 2016. Short text corpus with focus on humor detection. Original-date: 2016-05-10T12:48:54Z.
- Jonathan Culpeper. 2021. Impoliteness and hate speech: Compare and contrast. Journal of Pragmatics, 179:4–11.
- SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1377–1414, Barcelona (online). International Committee for Computational Linguistics.
- A computational approach to politeness with application to social factors. In Annual Meeting of the Association for Computational Linguistics.
- Automated Hate Speech Detection and the Problem of Offensive Language. Proceedings of the International AAAI Conference on Web and Social Media, 11(1):512–515. Number: 1.
- Detox: A comprehensive dataset for German offensive language and conversation analysis. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 143–153, Seattle, Washington (Hybrid). Association for Computational Linguistics.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics.
- Suzanne Eggins. 2004. Introduction to systemic functional linguistics. A&c Black.
- Paul Ekman. 1992. An argument for basic emotions. Cognition and Emotion, 6(3-4):169–200. Publisher: Routledge _eprint: https://doi.org/10.1080/02699939208411068.
- Latent Hatred: A Benchmark for Understanding Implicit Hate Speech. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 345–363, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Semeval-2022 task 6: isarcasmeval, intended sarcasm detection in english and arabic. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 802–814.
- Brigitte Fischer and Cornelia Herbert. 2021. Emoji as affective symbols: affective judgments of emoji, emoticons, and human faces varying in emotional content. Frontiers in psychology, 12:645173.
- Sigmund Freud. 1960. Jokes and their relation to the unconscious. WW Norton & Company.
- Facilitating the Communication of Politeness through Fine-Grained Paraphrasing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5127–5140, Online. Association for Computational Linguistics.
- Saeko Fukushima and Michael Haugh. 2014. The role of emic understandings in theorizing im/politeness: The metapragmatics of attentiveness, empathy and anticipatory inference in Japanese and Chinese. Journal of Pragmatics, 74:165–179.
- Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3816–3830, Online. Association for Computational Linguistics.
- Detecting emotion stimuli in emotion-bearing sentences. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 152–165. Springer.
- Utterance-level Dialogue Understanding: An Empirical Study. ArXiv:2009.13902 [cs].
- Detecting cross-geographic biases in toxicity modeling on social media. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021), pages 313–328.
- Herbert P Grice. 1975. Logic and conversation. In Speech acts, pages 41–58. Brill.
- John Haiman. 1998. Talk is cheap: sarcasm, alienation, and the evolution of language. Oxford University Press, Oxford. OCLC: 252598275.
- Michael AK Halliday. 2004. Introduction: How big is a language? On the power of language. The language of science, 5:19–32.
- Michael Alexander Kirkwood Halliday. 1995. Discourse in society: Systemic functional perspectives. 50. Greenwood Publishing Group.
- Language, context, and text: Aspects of language in a social-semiotic perspective. Oxford University Press Oxford.
- Keith Harvey. 2000. Describing camp talk: language/pragmatics/politics. Language and Literature: International Journal of Stylistics, 9(3):240–260.
- Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6323–6331, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing. CoRR, abs/2111.09543.
- Janet Holmes. 2006. Sharing a laugh: Pragmatic aspects of humor and gender in the workplace. Journal of Pragmatics, 38(1):26–50. Special Issue: Gender and Humor.
- SemEval-2020 Task 7: Assessing Humor in Edited News Headlines. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 746–758, Barcelona (online). International Committee for Computational Linguistics.
- Dirk Hovy and Diyi Yang. 2021. The importance of modeling social factors of language: Theory and practice. In The 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics.
- Lora: Low-rank adaptation of large language models.
- A fine-grained comparison of pragmatic language understanding in humans and language models. arXiv preprint arXiv:2212.06801.
- A fine-grained comparison of pragmatic language understanding in humans and language models. ArXiv:2212.06801 [cs].
- Daniel R Huebner. 2021. Anachronism: The queer pragmatics of understanding the past in the present. The American Sociologist, 52(4):740–761.
- Pragmatic language of african american children and adolescents. Topics in Language Disorders, 35(1):8–45.
- From text to thought: How analyzing language can advance psychological science. Perspectives on Psychological Science, 17(3):805–826. PMID: 34606730.
- Can Machines Learn Morality? The Delphi Experiment. Publication Title: arXiv e-prints ADS Bibcode: 2021arXiv211007574J.
- Jigsaw. 2017. Toxic Comment Classification Challenge.
- Jigsaw. 2019. Unintended Bias in Toxicity Classification.
- Automatic Identification and Classification of Bragging in Social Media. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3945–3959, Dublin, Ireland. Association for Computational Linguistics.
- A just and comprehensive strategy for using NLP to address online abuse. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3658–3666, Florence, Italy. Association for Computational Linguistics.
- Your spouse needs professional help: Determining the contextual appropriateness of messages through modeling social relationships. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10994–11013.
- (Male, Bachelor) and (Female, Ph.D) have different connotations: Parallelly Annotated Stylistic Language Dataset with Multiple Personas. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1696–1706, Hong Kong, China. Association for Computational Linguistics.
- Dongyeop Kang and Eduard Hovy. 2021. Style is NOT a single variable: Case studies for cross-stylistic language understanding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2376–2387, Online. Association for Computational Linguistics.
- Beyond simple pessimism: effects of sadness and anger on social perception. Journal of personality and social psychology, 64(5):740.
- A Large Self-Annotated Corpus for Sarcasm. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
- Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466.
- On Classifying whether Two Texts are on the Same Side of an Argument. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10130–10138, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- An indirect measure of discrete emotions. Emotion, 20(4):659–676.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- From semantics to pragmatics: where IS can lead in Natural Language Processing (NLP) research. European Journal of Information Systems, 30(5):569–590.
- DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 986–995, Taipei, Taiwan. Asian Federation of Natural Language Processing.
- LI Hai-hui. 2019. Mitigation and Pragmatic Empathy. Journal of Literature and Art Studies, 9(2).
- Kristen A. Lindquist and Lisa Feldman Barrett. 2008. Constructing Emotion: The Experience of Fear as a Conceptual Act. Psychological Science, 19(9):898–903.
- Bing Liu. 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1):1–167.
- P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61–68, Dublin, Ireland. Association for Computational Linguistics.
- Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
- Nuria Lorenzo-Dus and Patricia Bou-Franch. 2003. Gender and politeness: Spanish and british undergraduates’ perceptions of appropriate requests. Género, lenguaje y traducción, pages 187–199.
- Detect rumors in microblog posts using propagation structure via kernel learning. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 708–717, Vancouver, Canada. Association for Computational Linguistics.
- Coding empathy in dialogue. Journal of Pragmatics, 192:116–132.
- Dissociating language and thought in large language models: a cognitive perspective. ArXiv:2301.06627 [cs].
- Asifa Majid. 2012. Current emotion research in the language sciences. Emotion Review, 4(4):432–443.
- SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles. ArXiv:2009.02696 [cs].
- B.W. Matthews. 1975. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Structure, 405(2):442–451.
- SemEval 2021 Task 7: HaHackathon, Detecting and Rating Humor and Offense. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 105–119, Online. Association for Computational Linguistics.
- Abuse is contextual, what about nlp? the role of context in abusive language annotation and detection. arXiv preprint arXiv:2103.14916.
- The effect of natural distribution shift on question answering models. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 6905–6916. PMLR.
- Sara Mills. 2004. Class, gender and politeness. Multilingua, 23.
- “So You Think You’re Funny?”: Rating the Humour Quotient in Standup Comedy. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10073–10079, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Semeval-2018 task 1: Affect in tweets. In Proceedings of the 12th international workshop on semantic evaluation, pages 1–17.
- Abhinav Moudgil. Short Jokes.
- Mteb: Massive text embedding benchmark. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2006–2029.
- Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786.
- Rosina Márquez Reiter and David M. Frohlich. 2020. A pragmatics of intimacy. Internet Pragmatics, 3(1):1–33.
- The CLEF-2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In Advances in Information Retrieval, Lecture Notes in Computer Science, pages 639–649, Cham. Springer International Publishing.
- Lying words: Predicting deception from linguistic styles. Personality and Social Psychology Bulletin, 29(5):665–675. PMID: 15272998.
- The polyserial correlation coefficient. Psychometrika, 47(3):337–347.
- Finding Deceptive Opinion Spam by Any Stretch of the Imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 309–319, Portland, Oregon, USA. Association for Computational Linguistics.
- Exploring the role of task transferability in large-scale multi-task learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2542–2550, Seattle, United States. Association for Computational Linguistics.
- Detecting Community Sensitive Norm Violations in Online Conversations. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3386–3397, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Malcolm R. Parks. 1981. Ideology in interpersonal communication: Off the couch and into the world. Annals of the International Communication Association, 5(1):79–107.
- Vahid Parvaresh. 2023. Covertly communicated hate speech: A corpus-assisted pragmatic study. Journal of Pragmatics, 205:63–77.
- Toxicity detection: Does context really matter? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4296–4305.
- SemEval-2021 Task 5: Toxic Spans Detection. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 59–69, Online. Association for Computational Linguistics.
- Jiaxin Pei and David Jurgens. 2020. Quantifying intimacy in language. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5307–5326, Online. Association for Computational Linguistics.
- Jiaxin Pei and David Jurgens. 2021. Measuring Sentence-Level and Aspect-Level (Un)certainty in Science Communications. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9959–10011, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- SemEval-2022 task 4: Patronizing and condescending language detection. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 298–307, Seattle, United States. Association for Computational Linguistics.
- It Takes Two to Lie: One to Lie, and One to Listen. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3811–3854, Online. Association for Computational Linguistics.
- Webis Clickbait Corpus 2017 (Webis-Clickbait-17).
- DEBAGREEMENT: A comment-reply dataset for (dis)agreement detection in online debates.
- Automatically Identifying Complaints in Social Media. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, ACL.
- Automatically Neutralizing Subjective Bias in Text. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):480–489. Number: 01.
- Improving language understanding by generative pre-training.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67.
- Sudha Rao and Joel Tetreault. 2018. Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 129–140, New Orleans, Louisiana. Association for Computational Linguistics.
- Measuring the Language of Self-Disclosure across Corpora. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1035–1047, Dublin, Ireland. Association for Computational Linguistics.
- Parsing pragmatics. The ASHA Leader, 17(13):14–17.
- SemEval-2017 task 4: Sentiment analysis in Twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 502–518, Vancouver, Canada. Association for Computational Linguistics.
- Willibald Ruch. 2010. The sense of humor: Explorations of a personality characteristic, volume 3. Walter de Gruyter.
- Large language models are not zero-shot communicators.
- Multitask prompted training enables zero-shot task generalization. In International Conference on Learning Representations.
- The risk of racial bias in hate speech detection. In Proceedings of the 57th annual meeting of the association for computational linguistics, pages 1668–1678.
- Social Bias Frames: Reasoning about Social and Power Implications of Language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5477–5490, Online. Association for Computational Linguistics.
- Social IQa: Commonsense reasoning about social interactions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4463–4473, Hong Kong, China. Association for Computational Linguistics.
- Klaus R. Scherer and Harald G. Wallbott. 1994. "Evidence for universality and cultural variation of differential emotion response patterning": Correction. Journal of Personality and Social Psychology, 67(1):55–55. Place: US Publisher: American Psychological Association.
- David Schlangen. 2021. Targeting the benchmark: On methodology in current natural language processing research. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 670–674.
- Stefan Schneider. 2010. Mitigation. Handbooks of pragmatics, pages 253–269.
- Stephanie Schnurr. 2010. 13. humour. Interpersonal pragmatics, 6:307.
- A computational approach to understanding empathy expressed in text-based mental health support. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5263–5276, Online. Association for Computational Linguistics.
- Improving imbalanced learning by pre-finetuning with data augmentation. In Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications, pages 68–82. PMLR.
- Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
- Ellen Spertus. 1997. Smokey: Automatic recognition of hostile messages. In Aaai/iaai, pages 1058–1065.
- Karyn Stapleton. 2003. Gender and swearing: A community practice. Women and Language, 26(2):22.
- A method for linguistic metaphor identification: from MIP to MIPVU. Number 14 in Converging evidence in language and communication research. Benjamins, Amsterdam.
- Jürg Strässler. 1982. Idioms in English: A pragmatic analysis, volume 183. Gunter Narr Verlag.
- It takes two to tango: Navigating conceptualizations of NLP tasks and measurements of performance. In Findings of the Association for Computational Linguistics: ACL 2023, pages 3234–3279, Toronto, Canada. Association for Computational Linguistics.
- Thakur Ashutosh Suman and Abhinav Jain. 2021. AStarTwice at SemEval-2021 task 5: Toxic span detection using RoBERTa-CRF, domain specific pre-training and self-training. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 875–880, Online. Association for Computational Linguistics.
- On the machine learning of ethical judgments from natural language. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Llama: Open and efficient foundation language models.
- Llama 2: Open foundation and fine-tuned chat models.
- Elliot Turiel. 1983. The development of social knowledge: Morality and convention. Cambridge University Press.
- SemEval-2018 Task 3: Irony Detection in English Tweets. In Proceedings of the 12th International Workshop on Semantic Evaluation, pages 39–50, New Orleans, Louisiana. Association for Computational Linguistics.
- Introducing CAD: the Contextual Abuse Dataset. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2289–2303, Online. Association for Computational Linguistics.
- Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems, 32.
- Glue: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics.
- Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
- Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA. Curran Associates Inc.
- Measure and improve robustness in NLP models: A survey. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4569–4586, Seattle, United States. Association for Computational Linguistics.
- Self-instruct: Aligning language model with self generated instructions.
- Zijian Wang and Christopher Potts. 2019. TalkDown: A Corpus for Condescension Detection in Context. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3711–3719, Hong Kong, China. Association for Computational Linguistics.
- Understanding abuse: A typology of abusive language detection subtasks. In Proceedings of the First Workshop on Abusive Language Online, pages 78–84, Vancouver, BC, Canada. Association for Computational Linguistics.
- Zeerak Waseem and Dirk Hovy. 2016. Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In Proceedings of the NAACL Student Research Workshop, pages 88–93, San Diego, California. Association for Computational Linguistics.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
- Ludwig Wittgenstein. 1953. Philosophical Investigations. Basil Blackwell, Oxford.
- Bloom: A 176b-parameter open-access multilingual language model.
- Misinformation in social media: definition, manipulation, and detection. ACM SIGKDD Explorations Newsletter, 21(2):80–90.
- Predicting the Type and Target of Offensive Posts in Social Media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1415–1420, Minneapolis, Minnesota. Association for Computational Linguistics.
- SemEval-2019 task 6: Identifying and categorizing offensive language in social media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 75–86, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
- Yunxiang Zhang and Xiaojun Wan. 2022. MOVER: Mask, Over-generate and Rank for Hyperbole Generation. ArXiv:2109.07726 [cs].
- Evaluating commonsense in pre-trained language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 9733–9740.
- Avner Ziv. 2010. The social function of humor in interpersonal relationships. Society, 47(1):11–18.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.