SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks
Abstract: LLMs can justify or critique their predictions through discussions with other models or humans, thereby enriching their intrinsic understanding of instances. While proactive discussions in the inference phase have been shown to boost performance, such interactions have not been extensively explored during the training phase. We hypothesize that incorporating interactive discussions into the training process can enhance the models' understanding and improve their reasoning and verbal expression abilities during inference. This work introduces the SAIE framework, which facilitates supportive and adversarial discussions between learner and partner models. The learner model receives responses from the partner, and its parameters are then updated based on this discussion. This dynamic adjustment process continues throughout the training phase, responding to the evolving outputs of the learner model. Our empirical evaluation across various tasks, including math problems, commonsense reasoning, and multi-domain knowledge, demonstrates that models fine-tuned with the SAIE framework outperform those trained with conventional fine-tuning approaches. Furthermore, our method enhances the models' reasoning capabilities, improving both individual and multi-agent inference performance.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901.
- Reconcile: Round-table conference improves reasoning via consensus among diverse llms. arXiv:2309.13007.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Scaling instruction-finetuned language models. arXiv:2210.11416.
- Training verifiers to solve math word problems. arXiv:2110.14168.
- Improving factuality and reasoning in language models through multiagent debate. arXiv:2305.14325.
- Measuring massive multitask language understanding. In International Conference on Learning Representations.
- Lora: Low-rank adaptation of large language models. arXiv:2106.09685.
- Solving nlp problems through human-system collaboration: A discussion-based approach. arXiv:2305.11789.
- Ben Kilby. 2021. Dialogic pedagogies: Defining and analyzing four types of dialogue in education. Analytic Teaching and Philosophical Praxis, 41(2):106–121.
- Outfox: Llm-generated essay detection through in-context learning with adversarially generated examples. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada.
- Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, volume 35, pages 22199–22213. Curran Associates, Inc.
- Openassistant conversations - democratizing large language model alignment. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- Encouraging divergent thinking in large language models through multi-agent debate. arXiv:2305.19118.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In International Conference on Learning Representations.
- Self-refine: Iterative refinement with self-feedback. arXiv:2303.17651.
- Jab: Joint adversarial prompting and belief augmentation. arXiv:2311.09473.
- OpenAI. 2023. Gpt-4 technical report. arXiv:2303.08774.
- Jonathan Osborne. 2010. Arguing to learn in science: the role of collaborative, critical discourse. Science, 328,5977:463–466.
- Refiner: Reasoning feedback on intermediate representations. arXiv:2304.01904.
- Yashar Talebirad and Amirhossein Nadiri. 2023. Multi-agent collaboration: Harnessing the power of intelligent llm agents. arXiv:2306.03314.
- Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv:1811.00937.
- Chain-of-thought prompting elicits reasoning in large language models. arXiv:2201.11903.
- Generating sequences by learning to self-correct. arXiv:2211.00053.
- Large language models are better reasoners with self-verification. arXiv:2212.09561.
- Examining inter-consistency of large language models collaboration: An in-depth analysis via debate. arXiv:2305.11595.
- An llm can fool itself: A prompt-based adversarial attack. arXiv:2310.13345.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.