Contrastive Search Is What You Need For Neural Text Generation

Published 25 Oct 2022 in cs.CL | (2210.14140v3)

Abstract: Generating text with autoregressive LMs is of great importance to many NLP applications. Previous solutions for this task often produce text that contains degenerative expressions or lacks semantic consistency. Recently, Su et al. introduced a new decoding method, contrastive search, based on the isotropic representation space of the LLM and obtained new state of the art on various benchmarks. Additionally, Su et al. argued that the representations of autoregressive LMs (e.g. GPT-2) are intrinsically anisotropic which is also shared by previous studies. Therefore, to ensure the LLM follows an isotropic distribution, Su et al. proposed a contrastive learning scheme, SimCTG, which calibrates the LLM's representations through additional training. In this study, we first answer the question: "Are autoregressive LMs really anisotropic?". To this end, we extensively evaluate the isotropy of LMs across 16 major languages. Surprisingly, we find that the anisotropic problem only exists in the two specific English GPT-2-small/medium models. On the other hand, all other evaluated LMs are naturally isotropic which is in contrast to the conclusion drawn by previous studies. Based on our findings, we further assess the contrastive search decoding method using off-the-shelf LMs on four generation tasks across 16 languages. Our experimental results demonstrate that contrastive search significantly outperforms previous decoding methods without any additional training. More notably, on 12 out of the 16 evaluated languages, contrastive search performs comparably with human-level performances as judged by human evaluations. Our code and other related resources are publicly available at https://github.com/yxuansu/Contrastive_Search_Is_What_You_Need.

Abstract PDF Upgrade to Chat

Citations (48)

View on Semantic Scholar

Summary

The paper introduces contrastive search to balance model confidence and degeneration penalty, enhancing text generation quality.
It revisits LM representation anisotropy, revealing most models exhibit isotropic properties that support consistent token sampling.
Contrastive search achieves human-comparable coherence in 12 of 16 languages, offering a robust, training-free decoding strategy.

Contrastive Search for Neural Text Generation

The paper "Contrastive Search Is What You Need For Neural Text Generation" addresses the critical challenges in neural text generation using autoregressive LMs, such as semantic inconsistency and degenerative expressions in generated text. Existing decoding methods, such as beam search or top- $k$ sampling, often produce repetitive or semantically incongruent outputs. This study introduces contrastive search as a novel decoding strategy, leveraging the isotropic nature of LMs’ representation spaces to enhance text generation quality.

Anisotropy in LLM Representations

A key aspect of this study involves revisiting the anisotropic properties of autoregressive LMs. Previous research suggested a systemic issue of anisotropy across LMs like GPT-2, claiming that token representations reside narrowly within the representational space. However, this investigation surprisingly found that the anisotropy problem is predominantly limited to the GPT-2-small and medium English models. Upon evaluation across 38 LMs in 16 languages, most LMs were found to process isotropic representations, challenging the prior conclusions. Such isotropy facilitates more discriminative token representations, crucial for maintaining semantic consistency during generation.

Contrastive Search: Methodology and Evaluation

Contrastive search introduces a balance between model confidence and degeneration penalty to modulate token selection during decoding. By employing this method on isotropic LMs without additional training, the authors demonstrate significant improvements in human-judged text coherence on various tasks and languages. Contrastive search achieves up to human-comparable performance in 12 out of the 16 languages tested, showcasing its efficacy and extensibility across diverse linguistic contexts.

Broader Implications and Future Prospects

The implications of this study are twofold. Practically, contrastive search presents an efficient strategy to enhance the quality of outputs from pre-trained LMs, notably on platforms where re-training is computationally prohibitive. Theoretically, the findings prompt a re-evaluation of isotropy as a critical factor in developing and employing LMs for text generation. The confirmation of isotropic representations in most LMs suggests additional avenues for exploiting pre-trained architectures beyond traditional enhancement methods.

The paper also alludes to future advancements where autonomous knowledge probing and dataset synthesis may leverage contrastive search to facilitate zero-shot learning and adaptive data generation. As LMs scale in complexity and application, deploying robust and efficient decoding methods will be pivotal in harnessing their full potential.

In conclusion, contrastive search emerges as a promising decoding paradigm, rectifying semantic inconsistencies commonly encountered in neural text generation. By capitalizing on the isotropic nature of LLM representations, this approach provides a coherent and versatile tool for researchers and practitioners in natural language processing, enhancing the pathway to more nuanced and contextually relevant AI-driven interactions.