Papers
Topics
Authors
Recent
Search
2000 character limit reached

Suri: Multi-constraint Instruction Following for Long-form Text Generation

Published 27 Jun 2024 in cs.CL | (2406.19371v2)

Abstract: Existing research on instruction following largely focuses on tasks with simple instructions and short responses. In this work, we explore multi-constraint instruction following for generating long-form text. We create Suri, a dataset with 20K human-written long-form texts paired with LLM-generated backtranslated instructions that contain multiple complex constraints. Because of prohibitive challenges associated with collecting human preference judgments on long-form texts, preference-tuning algorithms such as DPO are infeasible in our setting; thus, we propose Instructional ORPO (I-ORPO), an alignment method based on the ORPO algorithm. Instead of receiving negative feedback from dispreferred responses, I-ORPO obtains negative feedback from synthetically corrupted instructions generated by an LLM. Using Suri, we perform supervised and I-ORPO fine-tuning on Mistral-7b-Instruct-v0.2. The resulting models, Suri-SFT and Suri-I-ORPO, generate significantly longer texts (~5K tokens) than base models without significant quality deterioration. Our human evaluation shows that while both SFT and I-ORPO models satisfy most constraints, Suri-I-ORPO generations are generally preferred for their coherent and informative incorporation of the constraints. We release our code at https://github.com/chtmp223/suri.

Citations (8)

Summary

  • The paper presents the Suri dataset and I-ORPO method to enhance LLMs' ability to follow complex, multi-constraint instructions in long-form text generation.
  • It leverages backtranslation and syntactically corrupted instructions to fine-tune models, resulting in texts averaging over 5,000 tokens with improved coherence.
  • Human evaluations and ranking experiments demonstrate that Suri-I-ORPO outperforms baseline models by at least 10% in distinguishing correct from corrupted instructions.

Suri: Multi-constraint Instruction Following for Long-form Text Generation

The paper "Suri: Multi-constraint Instruction Following for Long-form Text Generation" presents an in-depth study on enhancing the instruction-following capabilities of LLMs in the context of generating long-form text. Authored by Chau Minh Pham, Simeng Sun, and Mohit Iyyer from the University of Massachusetts Amherst, the paper explores complex, multi-constraint instruction following—a topic that has been underexplored in the domain of LLM-based text generation.

Overview

The paper introduces Suri, a dataset comprising 20,000 human-written long-form texts accompanied by LLM-generated backtranslated instructions containing multiple complex constraints. The Suri dataset is notable for its combination of long-form outputs (up to 5,024 tokens) and intricate, multi-faceted instructions. This union offers a unique opportunity to fine-tune LLMs to follow complex directives over extended textual spans, a feat that traditional datasets like Alpaca have not achieved.

Methodology

The authors outline the creation and utilization of the Suri dataset through two primary contributions:

  1. Dataset Construction:
    • The dataset includes long human-written texts sourced from existing corpora such as ChapterBreak, Books3, and RedPajama-Data-v2.
    • Backtranslation techniques are used to generate comprehensive instructions for these texts, followed by the creation of syntactically corrupted instructions to facilitate preference tuning.
  2. Alignment Using I-ORPO:
    • The paper introduces Instructional Odds Ratio Preference Optimization (I-ORPO), a variant of the ORPO algorithm.
    • I-ORPO uses synthetically corrupted instructions instead of dispreferred responses, which proves to be a robust alignment strategy for LLMs when human preference data on long-form text are impractical to obtain.

Key Findings

Evaluations and Results:

  • Both Suri-I-ORPO and Suri-SFT (supervised fine-tuning) models produced texts averaging 5,100 and 4,800 tokens respectively, markedly longer than those generated by baseline models such as Mistral-7B-Instruct-v0.2 and Llama-3-8B-Instruct.
  • Human evaluations suggest a strong preference for Suri-I-ORPO over Suri-SFT due to coherence, informativeness, and readability.
  • The models maintained low levels of nn-gram repetitions, indicating sustained textual quality even with long-form outputs.
  • Ranking accuracy experiments revealed that Suri-I-ORPO achieved a significant improvement (at least 10%) over baseline models in distinguishing between correct and corrupted instructions.

Implications

Theoretical Implications:

  • The introduction of multi-constraint instructions combined with extended text generation challenges existing paradigms in LLM fine-tuning and necessitates more complex alignment techniques.
  • The success of I-ORPO suggests that leveraging corrupted instructions as negative feedback can be a viable approach in lacking human preference data, a scenario often encountered in real-world applications.

Practical Implications:

  • Suri-I-ORPO's ability to generate coherent long-form text with intricate constraints could significantly benefit industries requiring detailed report generation, creative writing, and comprehensive content creation.
  • The methods proposed could be adapted for LLMs in other languages and genres, broadening the utility of advanced instruction-following models.

Future Directions

Examining how different LLM architectures respond to fine-tuning using the Suri dataset could yield further insights into model-specific intricacies. Additionally, exploring the influence of surface features, such as instruction length, and varying the degree of constraint violations could refine the I-ORPO method. Lastly, testing these models on shorter-context tasks would help understand any trade-offs associated with optimizing for long-form generation.

Conclusion

"Suri: Multi-constraint Instruction Following for Long-form Text Generation" offers a comprehensive methodology and dataset for enhancing LLM capabilities in following complex instructions over long textual spans. By introducing the Suri dataset and the I-ORPO alignment method, the authors provide valuable contributions to the field of AI-driven text generation, paving the way for more advanced and nuanced LLM applications.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 7 tweets with 102 likes about this paper.