NLP Verification: Towards a General Methodology for Certifying Robustness
Abstract: Machine Learning (ML) has exhibited substantial success in the field of NLP. For example LLMs have empirically proven to be capable of producing text of high complexity and cohesion. However, they are prone to inaccuracies and hallucinations. As these systems are increasingly integrated into real-world applications, ensuring their safety and reliability becomes a primary concern. There are safety critical contexts where such models must be robust to variability or attack, and give guarantees over their output. Computer Vision had pioneered the use of formal verification of neural networks for such scenarios and developed common verification standards and pipelines, leveraging precise formal reasoning about geometric properties of data manifolds. In contrast, NLP verification methods have only recently appeared in the literature. While presenting sophisticated algorithms, these papers have not yet crystallised into a common methodology. They are often light on the pragmatical issues of NLP verification and the area remains fragmented. In this paper, we attempt to distil and evaluate general components of an NLP verification pipeline, that emerges from the progress in the field to date. Our contributions are two-fold. Firstly, we propose a general methodology to analyse the effect of the embedding gap, a problem that refers to the discrepancy between verification of geometric subspaces and the semantic meaning of sentences, which the geometric subspaces are supposed to represent. We propose a number of practical NLP methods that can help to quantify the effects of the embedding gap. Secondly, we give a general method for training and verification of neural networks that leverages a more precise geometric estimation of semantic similarity of sentences in the embedding space and helps to overcome the effects of the embedding gap in practice.
- Mirages. on anthropomorphism in dialogue systems. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4776–4790, 2023.
- Risk-graded safety for handling medical queries in conversational ai. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, pages 234–243, 2022.
- Generating natural language adversarial examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2890–2896, 2018.
- Michael Atleson. Chatbots, deepfakes, and voice clones: AI deception for sale. https://www.ftc.gov/business-guidance/blog/2023/03/chatbots-deepfakes-voice-clones-ai-deception-sale, 2023. Federal Trade Commission. Accessed: 2023-06-16.
- The second international verification of neural networks competition (vnn-comp 2021): Summary and results. arXiv preprint arXiv:2109.00498, 2021.
- Scalable quantitative verification for deep neural networks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 312–323. IEEE, 2021.
- The quickhull algorithm for convex hulls. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 22(4):469–483, 1996.
- Open llm leaderboard. https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard, 2023.
- Synthetic and natural noise both break neural machine translation. arXiv preprint arXiv:1711.02173, 2017.
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 610–623, 2021.
- Guiding the release of safer e2e conversational ai through value sensitive design. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 39–52, 2022.
- Patient and consumer safety risks when using conversational assistants for medical information: an observational study of siri, alexa, and google assistant. Journal of medical Internet research, 20(9):e11510, 2018.
- On the opportunities and risks of foundation models, 2021.
- Fast and precise certification of transformers. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, pages 466–481, 2021.
- A large annotated corpus for learning natural language inference. In Lluís Màrquez, Chris Callison-Burch, and Jian Su, editors, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal, September 2015. Association for Computational Linguistics.
- The fourth international verification of neural networks competition (vnn-comp 2023): Summary and results. arXiv preprint arXiv:2312.16760, 2023.
- First three years of the international verification of neural networks competition (vnn-comp). International Journal on Software Tools for Technology Transfer, 25(3):329–339, 2023.
- Branch and bound for piecewise linear neural network verification. Journal of Machine Learning Research, 21(2020), 2020.
- A unified view of piecewise linear neural network verification. Advances in Neural Information Processing Systems, 31, 2018.
- Tasa: Deceiving question answering models by twin answer sentences attack. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11975–11992, 2022.
- Antonio: Towards a systematic method of generating nlp benchmarks for verification. In Nina Narodytska, Guy Amir, Guy Katz, and Omri Isac, editors, Proceedings of the 6th Workshop on Formal Methods for ML-Enabled Autonomous Systems, volume 16 of Kalpa Publications in Computing, pages 59–70. EasyChair, 2023.
- Neural network robustness as a verification property: A principled case study. In Computer Aided Verification (CAV 2022), Lecture Notes in Computer Science. Springer, 2022.
- Maximum resilience of artificial neural networks. In Automated Technology for Verification and Analysis: 15th International Symposium, ATVA 2017, Pune, India, October 3–6, 2017, Proceedings 15, pages 251–268. Springer, 2017.
- Robust neural machine translation with doubly adversarial inputs. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4324–4333, 2019.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023.
- cjadams Jeffrey Sorensen Julia Elliott Lucas Dixon Mark McDonald nithum and Will Cukierski. Toxic comment classification challenge, 2017.
- Certified adversarial robustness via randomized smoothing. In international conference on machine learning, pages 1310–1320. PMLR, 2019.
- Patrick Cousot. Verification by abstract interpretation. In Verification: Theory and Practice: Essays Dedicated to Zohar Manna on the Occasion of His 64th Birthday, pages 243–268. Springer, 2003.
- Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, pages 238–252, 1977.
- Abstract interpretation: past, present and future. In Proceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 1–10, 2014.
- George Dantzig. Linear programming and extensions. Princeton university press, 1963.
- BERT: Pre-training of deep bidirectional transformers for language understanding, 2018.
- Nl-augmenter: A framework for task-sensitive natural language augmentation. CoRR, abs/2112.02721, 2021.
- SafetyKit: First aid for measuring safety in open-domain conversational systems. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4113–4133, Dublin, Ireland, May 2022. Association for Computational Linguistics.
- Anticipating safety issues in E2E conversational AI: Framework and tooling, 2021.
- Towards robustness against natural language word substitutions. arXiv preprint arXiv:2107.13541, 2021.
- Cert-rnn: Towards certifying the robustness of recurrent neural networks. CCS, 21(2021):15–19, 2021.
- A framework for robustness certification of smoothed classifiers using f-divergences. 2020.
- Hotflip: White-box adversarial examples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 31–36, 2018.
- A survey of data augmentation approaches for NLP. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 968–988, Online, 2021. Association for Computational Linguistics.
- Complete verification via multi-neuron relaxation guided branch-and-bound. In International Conference on Learning Representations, 2022.
- Black-box generation of adversarial text sequences to evade deep learning classifiers. In 2018 IEEE Security and Privacy Workshops (SPW), pages 50–56. IEEE, 2018.
- Ai2: Safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE symposium on security and privacy (SP), pages 3–18. IEEE, 2018.
- The zonotope abstract domain taylor1+. In Computer Aided Verification: 21st International Conference, CAV 2009, Grenoble, France, June 26-July 2, 2009. Proceedings 21, pages 627–633. Springer, 2009.
- Explaining and harnessing adversarial examples, 2015.
- Scalable verified training for provably robust image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4842–4851, 2019.
- The rua-robot dataset: Helping avoid chatbot deception by detecting user questions about human or non-human identity. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6999–7013, 2021.
- LLC Gurobi Optimization. gurobi: Gurobi optimizer 9.1 interface. R package version, pages 9–1, 2020.
- Advances in natural language processing. Science, 349(6245):261–266, 2015.
- Achieving verified robustness to symbol substitutions via interval bound propagation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4083–4093, 2019.
- Fooling explanations in text classifiers. arXiv preprint arXiv:2206.03178, 2022.
- Adversarial example generation with syntactically controlled paraphrase networks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1875–1885, 2018.
- Adversarial examples for evaluating reading comprehension systems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2021–2031, 2017.
- Certified robustness to adversarial word substitutions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4129–4142, 2019.
- On geometric structure of activation spaces in neural networks, 2019.
- Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 8018–8025, 2020.
- Provable certificates for adversarial examples: Fitting a ball in the union of polytopes. Advances in neural information processing systems, 32, 2019.
- A formally-verified c static analyzer. ACM SIGPLAN Notices, 50(1):247–259, 2015.
- Reluplex: An efficient smt solver for verifying deep neural networks. In International conference on computer aided verification, pages 97–117. Springer, 2017.
- The singular value decomposition: Its computation and some applications. IEEE Transactions on automatic control, 25(2):164–176, 1980.
- Popqorn: Quantifying robustness of recurrent neural networks. In International Conference on Machine Learning, pages 3468–3477. PMLR, 2019.
- Adversarial robustness: Theory and practice. Tutorial at NeurIPS, page 3, 2018.
- Mauritz Kop. Eu artificial intelligence act: The european approach to ai, 2021.
- Automated pipeline design. In Proc. of 38th ACM/IEEE Design Automation Conference (DAC 2001), pages 810–815. ACM Press, 2001.
- Invbert: Reconstructing text from contextualized word embeddings by inverting the bert pipeline. arXiv preprint arXiv:2109.10104, 2021.
- Adversarial examples for natural language classification problems. 2018.
- Certified robustness to adversarial examples with differential privacy. In 2019 IEEE symposium on security and privacy (SP), pages 656–672. IEEE, 2019.
- California State Legislature. California senate bill no. 1001. 2018.
- Phrase-level textual adversarial attack with label preservation. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1095–1112, 2022.
- Google duplex: An AI system for accomplishing real world tasks over the phone. Google AI Blog, 2018.
- Certified adversarial robustness with additive noise. Advances in neural information processing systems, 32, 2019.
- Textbugger: Generating adversarial text against real-world applications. Proceedings 2019 Network and Distributed System Security Symposium, 2019.
- Sok: Certified robustness for deep neural networks. In 2023 IEEE symposium on security and privacy (SP), pages 1289–1310. IEEE, 2023.
- Learning question classifiers. In COLING 2002: The 19th International Conference on Computational Linguistics, 2002.
- Searching for an effective defender: Benchmarking defense against adversarial word substitution. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3137–3147, 2021.
- Deep text classification can be fooled. arXiv preprint arXiv:1704.08006, 2017.
- Johnny Lieu. Google’s creepy AI phone call feature will disclose it’s a robot, after backlash. https://mashable.com/2018/05/11/google-duplex-disclosures-robot, 2018. Mashable. Accessed 2023-03-16.
- Intraclass correlation–a discussion and demonstration of basic features. PloS one, 14(7):e0219854, 2019.
- Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics.
- Truthfulqa: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3214–3252, 2022.
- Algorithms for verifying deep neural networks. Foundations and Trends® in Optimization, 4(3-4):244–404, 2021.
- An approach to reachability analysis for feed-forward relu neural networks. arXiv preprint arXiv:1706.07351, 2017.
- Fastened crown: Tightened neural network robustness certificates. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 5037–5044, 2020.
- Learning word vectors for sentiment analysis. In Dekang Lin, Yuji Matsumoto, and Rada Mihalcea, editors, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems, pages 165–172, 2013.
- Automating cryptographic protocol language generation from structured specifications. In Proceedings of the IEEE/ACM 10th International Conference on Formal Methods in Software Engineering, pages 91–101, 2022.
- Differentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning, pages 3578–3586. PMLR, 2018.
- Higher-order certification for randomized smoothing. Advances in Neural Information Processing Systems, 33:4501–4511, 2020.
- Christina Montgomery. Hearing on “Oversight of AI: Rules for Artificial Intelligence”. https://www.ibm.com/policy/wp-content/uploads/2023/05/Christina-Montgomery-Senate-Judiciary-Testimony-5-16-23.pdf, 2023. Accessed: 2023-06-01.
- Evaluating the robustness of neural language models to input perturbations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1558–1570, 2021.
- Niklas Muennighoff. Sgpt: Gpt sentence embeddings for semantic search, 2022.
- Prima: general and precise neural network certification via scalable convex hull approximations. Proc. ACM Program. Lang., 6(POPL):1–33, 2022.
- Certified training: Small boxes are all you need, 2023.
- Prima: Precise and general neural network certification via multi-neuron convex relaxations, 2021.
- Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Kevin Knight, Hwee Tou Ng, and Kemal Oflazer, editors, Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics.
- Formal verification of an arm processor. In Proceedings Twelfth International Conference on VLSI Design.(Cat. No. PR00013), pages 282–287. IEEE, 1999.
- Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
- Challenging smt solvers to verify neural networks. Ai Communications, 25(2):117–135, 2012.
- Data augmentation can improve robustness. Advances in Neural Information Processing Systems, 34:29935–29948, 2021.
- Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China, November 2019. Association for Computational Linguistics.
- Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6):1137–1149, 2016.
- Provably robust deep learning via adversarially trained smoothed classifiers. Advances in Neural Information Processing Systems, 32, 2019.
- Towards crafting text adversarial samples, 2017.
- Computing the approximate convex hull in high dimensions, 2016.
- Style transfer from non-parallel text by cross-alignment. Advances in neural information processing systems, 30, 2017.
- Robustness verification for transformers, 2020.
- Beyond the single neuron convex barrier for neural network certification. Advances in Neural Information Processing Systems, 32, 2019.
- Fast and effective robustness certification. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
- Replication package for the article: An abstract domain for certifying neural networks.
- An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL):1–30, 2019.
- Recursive deep models for semantic compositionality over a sentiment treebank. In David Yarowsky, Timothy Baldwin, Anna Korhonen, Karen Livescu, and Steven Bethard, editors, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA, October 2013. Association for Computational Linguistics.
- Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27, 2014.
- Evaluating robustness of neural networks with mixed integer programming. n International Conference on Learning Representations,, 2019.
- Llama: Open and efficient foundation language models, 2023.
- Yuli Vasiliev. Natural language processing with Python and spaCy: A practical introduction. No Starch Press, 2020.
- Adversarial glue: A multi-task benchmark for robustness evaluation of language models. arXiv preprint arXiv:2111.02840, 2021.
- Beta-crown: Efficient bound propagation with per-neuron split constraints for neural network robustness verification. Advances in Neural Information Processing Systems, 34:29909–29921, 2021.
- Minilmv2: Multi-head self-attention relation distillation for compressing pretrained transformers. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2140–2151, 2021.
- Certified robustness to word substitution attack with differential privacy. In Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou, editors, Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1102–1112, Online, June 2021. Association for Computational Linguistics.
- Towards a robust deep neural network against adversarial texts: A survey. IEEE Transactions on Knowledge and Data Engineering, pages 1–1, 2021.
- Measure and improve robustness in nlp models: A survey. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4569–4586, 2022.
- Robustness-aware word embedding improves certified robustness to adversarial word substitutions. In Findings of the Association for Computational Linguistics: ACL 2023, pages 673–687, 2023.
- Robust machine comprehension models via adversarial training. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 575–581, 2018.
- Ethical and social risks of harm from language models, 2021.
- Towards verified robustness under text deletion interventions. 2020.
- A broad-coverage challenge corpus for sentence understanding through inference. In Marilyn Walker, Heng Ji, and Amanda Stent, editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana, June 2018. Association for Computational Linguistics.
- Provable defenses against adversarial examples via the convex outer adversarial polytope. In International conference on machine learning, pages 5286–5295. PMLR, 2018.
- Formal methods: Practice and experience. ACM computing surveys (CSUR), 41(4):1–36, 2009.
- World Economic Forum. Chatbots reset: A framework for governing responsible use of conversational ai in healthcare. https://www.weforum.org/reports/chatbots-reset-a-framework -for-governing-responsible-use-of-conversational-ai-in-healthcare, 2020. Accessed 2023-06-19.
- Marabou 2.0: A versatile formal analyzer of neural networks, 2024.
- Polyjuice: Generating counterfactuals for explaining, evaluating, and improving models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6707–6723, 2021.
- Fast and complete: Enabling complete neural network verification with rapid and massively parallel incomplete verifiers. In International Conference on Learning Representation (ICLR), 2021.
- Safer: A structure-free approach for certified robustness to adversarial word substitutions. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3465–3475, 2020.
- Unit: A unified look at certified robust training against text adversarial perturbation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Lu Yu and Verena Rieser. Adversarial robustness of visual dialog, 2022.
- Certified robustness to text adversarial attacks by randomized [mask]. Computational Linguistics, 49(2):395–427, 2023.
- Black-box certification with randomized smoothing: A functional optimization based framework. Advances in Neural Information Processing Systems, 33:2316–2326, 2020.
- Towards stable and efficient training of verifiably robust neural networks. In 8th International Conference on Learning Representations, ICLR 2020, 2020.
- General cutting planes for bound-propagation-based neural network verification. Advances in Neural Information Processing Systems, 35:1656–1670, 2022.
- Efficient neural network robustness certification with general activation functions. Advances in neural information processing systems, 31, 2018.
- Adversarial attacks on deep-learning models in natural language processing: A survey. ACM Transactions on Intelligent Systems and Technology (TIST), 11(3):1–41, 2020.
- Character-level convolutional networks for text classification. Advances in neural information processing systems, 28, 2015.
- Text-crs: A generalized certified robustness framework against textual adversarial attacks. In 2024 IEEE Symposium on Security and Privacy (SP), pages 53–53. IEEE Computer Society, 2023.
- Robustness to programmable string transformations via augmented abstract training. In Proceedings of the 37th International Conference on Machine Learning, pages 11023–11032, 2020.
- Certified robustness to programmable transformations in lstms. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1068–1083, 2021.
- Certified robustness for large language models with self-denoising. arXiv preprint arXiv:2307.07171, 2023.
- Certified robustness against natural language attacks by causal intervention. In International Conference on Machine Learning, pages 26958–26970. PMLR, 2022.
- Defense against synonym substitution-based adversarial attacks via dirichlet neighborhood ensemble. In Association for Computational Linguistics (ACL), 2021.
- Freelb: Enhanced adversarial training for natural language understanding. In International Conference on Learning Representations, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.