Papers
Topics
Authors
Recent
Search
2000 character limit reached

BenchCLAMP: A Benchmark for Evaluating Language Models on Syntactic and Semantic Parsing

Published 21 Jun 2022 in cs.CL | (2206.10668v2)

Abstract: Recent work has shown that generation from a prompted or fine-tuned LLM can perform well at semantic parsing when the output is constrained to be a valid semantic representation. We introduce BenchCLAMP, a Benchmark to evaluate Constrained LLM Parsing, that includes context-free grammars for seven semantic parsing datasets and two syntactic parsing datasets with varied output representations, as well as a constrained decoding interface to generate only valid outputs covered by these grammars. We provide low, medium, and high resource splits for each dataset, allowing accurate comparison of various LLMs under different data regimes. Our benchmark supports evaluation of LLMs using prompt-based learning as well as fine-tuning. We benchmark eight LLMs, including two GPT-3 variants available only through an API. Our experiments show that encoder-decoder pretrained LLMs can achieve similar performance or surpass state-of-the-art methods for syntactic and semantic parsing when the model output is constrained to be valid.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Task-oriented dialogue as dataflow synthesis. Transactions of the Association for Computational Linguistics, 8:556–571, 2020. doi: 10.1162/tacl_a_00333. URL https://www.aclweb.org/anthology/2020.tacl-1.36.
  2. antlr. grammars-v4. https://github.com/antlr/grammars-v4, 2022.
  3. Graph pre-training for AMR parsing and generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6001–6015, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.415. URL https://aclanthology.org/2022.acl-long.415.
  4. Abstract Meaning Representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pages 178–186, Sofia, Bulgaria, August 2013. Association for Computational Linguistics. URL https://aclanthology.org/W13-2322.
  5. Language models are few-shot learners. Computing Research Repository, arXiv:2005.14165, 2020.
  6. AMR parsing via graph-sequence iterative inference. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1290–1301, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.119. URL https://aclanthology.org/2020.acl-main.119.
  7. Smatch: an evaluation metric for semantic feature structures. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 748–752, Sofia, Bulgaria, August 2013. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/P13-2131.
  8. Semantic parsing with dual learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 51–64, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1007. URL https://www.aclweb.org/anthology/P19-1007.
  9. Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021. URL https://arxiv.org/abs/2107.03374.
  10. Conversational semantic parsing for dialog state tracking. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8107–8117, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.651. URL https://aclanthology.org/2020.emnlp-main.651.
  11. Jay Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94–102, 1970. doi: 10.1145/362007.362035. URL https://doi.org/10.1145/362007.362035.
  12. Enriched in-order linearization for faster sequence-to-sequence constituent parsing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4092–4099, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.376. URL https://aclanthology.org/2020.acl-main.376.
  13. Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing. CoRR, abs/2111.09543, 2021. URL https://arxiv.org/abs/2111.09543.
  14. Constituency parsing with a self-attentive encoder. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2676–2686, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1249. URL https://aclanthology.org/P18-1249.
  15. Grammar as a foreign language. In Advances in Neural Information Processing Systems, 2015.
  16. Efficient memory management for large language model serving with pagedattention. In Jason Flinn, Margo I. Seltzer, Peter Druschel, Antoine Kaufmann, and Jonathan Mace, editors, Proceedings of the 29th Symposium on Operating Systems Principles, SOSP 2023, Koblenz, Germany, October 23-26, 2023, pages 611–626. ACM, 2023. doi: 10.1145/3600006.3613165. URL https://doi.org/10.1145/3600006.3613165.
  17. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.703. URL https://www.aclweb.org/anthology/2020.acl-main.703.
  18. MTOP: A comprehensive multilingual task-oriented semantic parsing benchmark. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2950–2962, Online, April 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.eacl-main.257. URL https://aclanthology.org/2021.eacl-main.257.
  19. Seq2seq dependency parsing. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3203–3214, Santa Fe, New Mexico, USA, August 2018. Association for Computational Linguistics. URL https://aclanthology.org/C18-1271.
  20. Holistic evaluation of language models, 2022. URL https://arxiv.org/abs/2211.09110.
  21. In-Order Transition-based Constituent Parsing. Transactions of the Association for Computational Linguistics, 5:413–424, 11 2017. ISSN 2307-387X. doi: 10.1162/tacl_a_00070. URL https://doi.org/10.1162/tacl_a_00070.
  22. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  23. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330, 1993. URL https://aclanthology.org/J93-2004.
  24. Recursive non-autoregressive graph-to-graph transformer for dependency parsing with iterative refinement. Transactions of the Association for Computational Linguistics, 9:120–138, 2021. doi: 10.1162/tacl_a_00358. URL https://aclanthology.org/2021.tacl-1.8.
  25. Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1864–1874, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-acl.146. URL https://aclanthology.org/2022.findings-acl.146.
  26. Controllable semantic parsing via retrieval augmentation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7683–7698, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.607. URL https://aclanthology.org/2021.emnlp-main.607.
  27. Value-agnostic conversational semantic parsing. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3666–3681, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.284. URL https://aclanthology.org/2021.acl-long.284.
  28. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. URL http://jmlr.org/papers/v21/20-074.html.
  29. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019. URL https://arxiv.org/abs/1908.10084.
  30. Learning to retrieve prompts for in-context learning. CoRR, abs/2112.08633, 2021. URL https://arxiv.org/abs/2112.08633.
  31. Multitask prompted training enables zero-shot task generalization, 2021.
  32. PICARD: Parsing incrementally for constrained auto-regressive decoding from language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9895–9901, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.779. URL https://aclanthology.org/2021.emnlp-main.779.
  33. The power of prompt tuning for low-resource semantic parsing. CoRR, abs/2110.08525, 2021. URL https://arxiv.org/abs/2110.08525.
  34. Evalb bracket scoring program. URL: http://www. cs. nyu. edu/cs/projects/proteus/evalb, 1997.
  35. Adafactor: Adaptive learning rates with sublinear memory cost. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4596–4604. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/shazeer18a.html.
  36. Constrained language models yield few-shot semantic parsers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7699–7715, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.608. URL https://aclanthology.org/2021.emnlp-main.608.
  37. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. ArXiv, abs/2206.04615, 2022.
  38. Improving constituency parsing with span attention. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1691–1703, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.153. URL https://aclanthology.org/2020.findings-emnlp.153.
  39. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023. doi: 10.48550/arXiv.2307.09288. URL https://doi.org/10.48550/arXiv.2307.09288.
  40. Rik van Noord and Johan Bos. Dealing with co-reference in neural semantic parsing. In Proceedings of the 2nd Workshop on Semantic Deep Learning (SemDeep-2), pages 41–49, Montpellier, France, September 2017a. Association for Computational Linguistics. URL https://aclanthology.org/W17-7306.
  41. Rik van Noord and Johan Bos. Neural semantic parsing by character-based translation: Experiments with abstract meaning representations. Computational Linguistics in the Netherlands Journal, 7:93–108, 2017b.
  42. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium, November 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-5446. URL https://aclanthology.org/W18-5446.
  43. Superglue: A stickier benchmark for general-purpose language understanding systems. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/4496bf24afe7fab6f046bf4923da8de6-Paper.pdf.
  44. CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708, Online and Punta Cana, Dominican Republic, November 2021a. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.685. URL https://aclanthology.org/2021.emnlp-main.685.
  45. Building a semantic parser overnight. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1332–1342, Beijing, China, July 2015. doi: 10.3115/v1/P15-1129. URL https://www.aclweb.org/anthology/P15-1129.
  46. Towards zero-label language learning. CoRR, abs/2109.09193, 2021b. URL https://arxiv.org/abs/2109.09193.
  47. Sequence-to-sequence learning as beam-search optimization. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1296–1306, Austin, Texas, November 2016. Association for Computational Linguistics. doi: 10.18653/v1/D16-1137. URL https://aclanthology.org/D16-1137.
  48. Unifiedskg: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. EMNLP, 2022.
  49. Bottom-up constituency parsing and nested named entity recognition with pointer networks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2403–2416, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.171. URL https://aclanthology.org/2022.acl-long.171.
  50. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3911–3921, Brussels, Belgium, October-November 2018. doi: 10.18653/v1/D18-1425. URL https://www.aclweb.org/anthology/D18-1425.
  51. CoSQL: A conversational text-to-SQL challenge towards cross-domain natural language interfaces to databases. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1962–1979, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1204. URL https://aclanthology.org/D19-1204.
  52. Daniel Zeman et al. Universal dependencies 2.11, 2022. URL http://hdl.handle.net/11234/1-4923. LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.
  53. Stack-based multi-layer attention for transition-based dependency parsing. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1677–1682, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/D17-1175. URL https://aclanthology.org/D17-1175.
  54. Semantic evaluation for text-to-sql with distilled test suite. In The 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2020.
Citations (6)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.