FUDGE: Controlled Text Generation With Future Discriminators

Published 12 Apr 2021 in cs.CL and cs.LG | (2104.05218v2)

Abstract: We propose Future Discriminators for Generation (FUDGE), a flexible and modular method for controlled text generation. Given a pre-existing model G for generating text from a distribution of interest, FUDGE enables conditioning on a desired attribute a (for example, formality) while requiring access only to G's output logits. FUDGE learns an attribute predictor operating on a partial sequence, and uses this predictor's outputs to adjust G's original probabilities. We show that FUDGE models terms corresponding to a Bayesian decomposition of the conditional distribution of G given attribute a. Moreover, FUDGE can easily compose predictors for multiple desired attributes. We evaluate FUDGE on three tasks -- couplet completion in poetry, topic control in language generation, and formality change in machine translation -- and observe gains in all three tasks.

Abstract PDF Upgrade to Chat

Citations (277)

View on Semantic Scholar

Summary

The paper introduces FUDGE, using future discriminators to recalibrate generation probabilities based on partial sequences.
It demonstrates enhanced control over style and topic, outperforming methods like Pplm across tasks such as poetry and formality translation.
FUDGE's modular approach allows constraint composability without retraining base models, ensuring efficient and flexible text generation.

Analysis of "Fudge: Controlled Text Generation With Future Discriminators"

The paper presents a method known as "Future Discriminators for Generation" (Fudge), designed to achieve controlled text generation through the use of partial sequences. It offers a flexible, modular approach to apply constraints on the output of a pre-existing text generation model, leveraging only the model's output logits. By using Fudge, one can conditionally generate text that adheres to specific desired attributes, such as formality, while only introducing a minor computational overhead associated with managing these constraints.

Fudge's core innovation lies in re-calibrating generative process probabilities through a future attribute predictor, which estimates the likelihood of an attribute being present in the complete sequence. This effectively results in modifying the base distribution's probabilities, ensuring the conditional distribution is aligned with the desired attribute. The Bayesian decomposition of the conditional distribution is meticulously articulated and delineates Fudge's methodical derivation.

The experimental validations across three tasks—poetry couplet completion, topic control in language generation, and formality translation—demonstrate notable improvements over existing methods. Fudge notably surpasses both fine-tuning and a prevalent gradient-based method (Pplm) in controlling the attribute presence, assuring better task-specific outputs while maintaining or enhancing linguistic diversity.

Fudge offers several advantages over traditional approaches, such as not requiring direct access to the internals of the generative model, thus maintaining a barrier from any significant model retraining. Furthermore, its compatibility with various pre-trained models and the computational efficiency make Fudge a salient contribution to the field of controlled text generation. In particular, the ability to compose multiple constraints further extends its usability across various applications.

However, Fudge's architecture is not without limitations. It assumes the availability of a viable predictor trained across different domains, which might introduce dependencies on high-quality labeled datasets for predictor training. While the paper claims general applicability, the attribute predictor's accuracy—vitally crucial for Fudge—has not been exhaustively benchmarked across divergent domains with varying levels of data quality.

The implications of this research span both practical and theoretical realms. Practically, the modularity and flexibility of Fudge address various pressing needs in NLP, such as style transfer and topic adherence without fine-tuning extensive LLMs. Theoretically, it paves the way towards further exploration of modular architectures for both conditional generation and potentially other domains that involve constraining pre-trained models without compromising on computational efficiency.

Future endeavors might explore augmenting Fudge with approaches for better rejection sampling or reranking to improve outputs further. Moreover, enhancing the attribute predictor's capabilities on incomplete sequences could bolster its effectiveness, enabling more nuanced controlled generation tasks.

Overall, Fudge represents an adept method for controlled generation, advancing techniques in crafting sophisticated and attribute-attuned textual outputs from pre-established LLMs. The reduced complexity and improved performance characteristics make it a compelling choice for researchers and practitioners focusing on controlled text generation applications.