ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies
Abstract: Analogy-making is central to human cognition, allowing us to adapt to novel situations -- an ability that current AI systems still lack. Most analogy datasets today focus on simple analogies (e.g., word analogies); datasets including complex types of analogies are typically manually curated and very small. We believe that this holds back progress in computational analogy. In this work, we design a data generation pipeline, ParallelPARC (Parallel Paragraph Creator) leveraging state-of-the-art LLMs to create complex, paragraph-based analogies, as well as distractors, both simple and challenging. We demonstrate our pipeline and create ProPara-Logy, a dataset of analogies between scientific processes. We publish a gold-set, validated by humans, and a silver-set, generated automatically. We test LLMs' and humans' analogy recognition in binary and multiple-choice settings, and found that humans outperform the best models (~13% gap) after a light supervision. We demonstrate that our silver-set is useful for training models. Lastly, we show challenging distractors confuse LLMs, but not humans. We hope our pipeline will encourage research in this emerging field.
- Analogy generation by prompting large language models: A case study of InstructGPT. In Proceedings of the 15th International Conference on Natural Language Generation, pages 298–312, Waterville, Maine, USA and virtual meeting. Association for Computational Linguistics.
- Vasr: Visual analogies of situation recognition. In AAAI Conference on Artificial Intelligence.
- Language models are few-shot learners. ArXiv, abs/2005.14165.
- E-kar: A benchmark for rationalizing natural language analogical reasoning. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3941–3955.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- François Chollet. 2019. On the measure of intelligence. ArXiv, abs/1911.01547.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- C. Clement and D. Gentner. 1991. Systematicity as a selection constraint in analogical mapping. Cognitive Science, 15:89–132.
- John J. Clement. 1993. Using bridging analogies and anchoring institutions to seal with students’ preconceptions in physics.
- Scientific and creative analogies in pretrained language models. In Conference on Empirical Methods in Natural Language Processing.
- Tracking state changes in procedural text: A challenge dataset and models for process paragraph comprehension. NAACL.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255.
- Reinders Duit. 1991. On the role of analogies and metaphors in learning science. Science Education, 75:649–672.
- Dedre Gentner. 1983. Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7(2):155–170.
- The roles of similarity in transfer: Separating retrieval from inferential soundness. Cognitive Psychology, 25:524–575.
- Connecting long distance: semantic distance in analogical reasoning modulates frontopolar cortex activity. Cerebral cortex (New York, N.Y. : 1991), 20(1):70—76.
- Douglas R Hofstadter and Emmanuel Sander. 2013. Surfaces and essences: Analogy as the fuel and fire of thinking. Basic books.
- Keith J Holyoak. 1984. Analogical thinking and human intelligence. Advances in the psychology of human intelligence, 2:199–230.
- Keith J Holyoak and Paul Thagard. 1996. Mental leaps: Analogy in creative thought.
- Accelerating innovation through analogy mining. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
- Storyanalogy: Deriving story-level analogies from large language models to unlock analogical understanding. arXiv preprint arXiv:2310.12874.
- SemEval-2012 task 2: Measuring degrees of relational similarity. In *SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pages 356–364, Montréal, Canada. Association for Computational Linguistics.
- The time course of semantic and relational processing during verbal analogical reasoning. Brain and Cognition, 129:25–34.
- Tal Linzen. 2016. Issues in evaluating semantic spaces using word analogies. In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, pages 13–18, Berlin, Germany. Association for Computational Linguistics.
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
- Efficient estimation of word representations in vector space. In International Conference on Learning Representations.
- Marvin Minsky. 1988. Society of mind. Simon and Schuster.
- Melanie Mitchell. 2021. Abstraction and analogy-making in artificial intelligence. Annals of the New York Academy of Sciences, 1505.
- Fundamental studies in design-by-analogy: A focus on domain-knowledge experts and applications to transactional design problems. Design Studies, 35:232–272.
- R OpenAI. 2023. Gpt-4 technical report. arXiv, pages 2303–08774.
- The refinedweb dataset for falcon llm: Outperforming curated corpora with web data, and web data only.
- The relational luring effect: Retrieval of relational information during associative recognition. Journal of Experimental Psychology: General, 146:722–745.
- Unsupervised representation learning with deep convolutional generative adversarial networks. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings.
- Deep visual analogy-making. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 1252–1260.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
- Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. In North American Chapter of the Association for Computational Linguistics.
- Visalogy: Answering visual analogy questions. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 1882–1890.
- Natalie Schluter. 2018. The word analogy testing caveat. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 242–246, New Orleans, Louisiana. Association for Computational Linguistics.
- Oren Sultan and Dafna Shahaf. 2022. Life is a circus and we are the clowns: Automatically finding analogies between situations and processes. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3547–3562, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- David P. Swain. 2000. The water-tower analogy of the cardiovascular system. Advances in physiology education, 24 1:43–50.
- Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html, 3(6):7.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
- Zero-shot image-to-text generation for visual-semantic arithmetic. ArXiv preprint, abs/2111.14447.
- Llama: Open and efficient foundation language models.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Bert is to nlp what alexnet is to cv: can pre-trained language models identify analogies? arXiv preprint arXiv:2105.04949.
- Below the surface: Analogical similarity and retrieval competition in reminding. Cognitive Psychology, 26:64–101.
- Beneath surface similarity: Large language models make reasonable scientific analogies after structure abduction. ArXiv, abs/2305.12660.
- Analogykb: Unlocking analogical reasoning of language models with a million-scale knowledge base. ArXiv, abs/2305.05994.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.