Prompt Exploration with Prompt Regression
Abstract: In the advent of democratized usage of LLMs, there is a growing desire to systematize LLM prompt creation and selection processes beyond iterative trial-and-error. Prior works majorly focus on searching the space of prompts without accounting for relations between prompt variations. Here we propose a framework, Prompt Exploration with Prompt Regression (PEPR), to predict the effect of prompt combinations given results for individual prompt elements as well as a simple method to select an effective prompt for a given use-case. We evaluate our approach with open-source LLMs of different sizes on several different tasks.
- Constitutional AI: Harmlessness from AI Feedback, December 2022.
- Ralph A. Bradley. 14 paired comparisons: Some basic procedures and examples. In Nonparametric Methods, volume 4 of Handbook of Statistics, pp. 299–326. Elsevier, 1984. doi: https://doi.org/10.1016/S0169-7161(84)04016-5. URL https://www.sciencedirect.com/science/article/pii/S0169716184040165.
- Language Models are Few-Shot Learners. arXiv:2005.14165 [cs], June 2020.
- Programming with linear fractional functionals. Naval Research logistics quarterly, 9(3-4):181–186, 1962.
- Free dolly: Introducing the world’s first truly open instruction-tuned llm, 2023. URL https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm.
- Mistral 7b, 2023.
- How can we know what language models know? Transactions of the Association for Computational Linguistics, 8:423–438, 2020.
- Camel: Communicative agents for ”mind” exploration of large language model society. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- What makes good in-context examples for gpt-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pp. 100–114, 2022.
- R Duncan Luce. Individual choice behavior. 1959.
- Cross-task generalization via natural language crowdsourcing instructions. In ACL, 2022.
- Adversarial NLI: A new benchmark for natural language understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2020.
- Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, October 2022.
- R. L. Plackett. The Analysis of Permutations. Journal of the Royal Statistical Society: Series C (Applied Statistics), 24(2):193–202, 1975. ISSN 1467-9876. doi: 10.2307/2346567.
- Grips: Gradient-free, edit-based instruction search for prompting large language models. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp. 3827–3846, 2023.
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model, May 2023.
- Paramesh Ray. Independence of irrelevant alternatives. Econometrica: Journal of the Econometric Society, pp. 987–991, 1973.
- Hatecheck: Functional tests for hate speech detection models. arXiv preprint arXiv:2012.15606, 2020.
- Quantifying language models’ sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting. arXiv preprint arXiv:2310.11324, 2023.
- Best arm identification for prompt learning under a limited budget. arXiv preprint arXiv:2402.09723, 2024.
- Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4222–4235, 2020.
- Principle-driven self-alignment of language models from scratch with minimal human supervision. arXiv preprint arXiv:2305.03047, 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SkeHuCVFDr.
- Auto-instruct: Automatic instruction generation and ranking for black-box language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 9850–9867, 2023.
- Calibrate before use: Improving few-shot performance of language models. In International conference on machine learning, pp. 12697–12706. PMLR, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.