FactAlign: Long-form Factuality Alignment of Large Language Models

Published 2 Oct 2024 in cs.CL and cs.AI | (2410.01691v1)

Abstract: LLMs have demonstrated significant potential as the next-generation information access engines. However, their reliability is hindered by issues of hallucination and generating non-factual content. This is particularly problematic in long-form responses, where assessing and ensuring factual accuracy is complex. In this paper, we address this gap by proposing FactAlign, a novel alignment framework designed to enhance the factuality of LLMs' long-form responses while maintaining their helpfulness. We introduce fKTO, a fine-grained, sentence-level alignment algorithm that extends the Kahneman-Tversky Optimization (KTO) alignment method. Leveraging recent advances in automatic factuality evaluation, FactAlign utilizes fine-grained factuality assessments to guide the alignment process. Our experiments on open-domain prompts and information-seeking questions demonstrate that FactAlign significantly improves the factual accuracy of LLM responses while also improving their helpfulness. Further analyses identify that FactAlign is capable of training LLMs to provide more information without losing factual precision, thus improving the factual F1 score. Our source code, datasets, and trained models are publicly available at https://github.com/MiuLab/FactAlign

Abstract PDF HTML Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper introduces FactAlign, a framework extending KTO with a fine-grained fKTO algorithm that uses sentence-level factuality evaluation to improve LLM factual precision.
Empirical results demonstrate FactAlign significantly enhances factual accuracy, evidenced by improved factual F1 scores on open-domain and information-seeking tasks.
FactAlign provides a method to mitigate hallucinations in LLMs, expanding their utility in applications requiring high factual reliability and suggesting pathways for integrating broader knowledge.

FactAlign: Long-form Factuality Alignment of LLMs

The paper "FactAlign: Long-form Factuality Alignment of LLMs" addresses the persistent challenge of factual inaccuracies or hallucinations in responses generated by LLMs. This is a prevalent issue especially when generating long-form content, where ensuring factual precision is more complex due to the interwoven nature of claims within a lengthy text. The authors tackle this problem by introducing FactAlign, an innovative alignment framework crafted to bolster factuality in LLM outputs while preserving their usefulness in various query contexts.

Conceptual Framework and Methodology

FactAlign is predicated on the extension of the Kahneman-Tversky Optimization (KTO) alignment approach, using a newly proposed algorithm designated as fKTO. This fine-grained algorithm operates at the sentence level, evaluating and aligning the factual content of each statement within a generated response. By leveraging advancements in automatic factuality evaluation, FactAlign effectively utilizes sentence-level assessments to direct the alignment process.

The study deploys fKTO alongside a factuality evaluator to parse long-form outputs into atomic statements, each analyzed against a curated knowledge corpus to determine its factual support status. This process aims to optimize the factual precision (and recall, via factual F1 score) of responses. The framework's efficacy is evidenced through rigorous testing on open-domain scenarios and information-seeking tasks, demonstrating significant improvements in factual accuracy and overall helpfulness of LLM responses.

Empirical Findings and Analysis

The authors validate their approach through experiments employing open-domain prompts and specific information-seeking questions. Notably, FactAlign shows marked enhancement in the factual accuracy of responses, as indicated by improved factual F1 scores. These numerical evaluations highlight the algorithm's proficiency in training LLMs to deliver more information-rich responses without compromising factuality.

Furthermore, the authors conduct an ablation study to ascertain the contribution of each component, underscoring the indispensable role of fine-grained factual alignment in achieving superior factuality metrics.

Implications and Future Directions

FactAlign presents significant theoretical and practical implications for the development of LLMs. Theoretically, it informs the design of alignment frameworks by demonstrating the effectiveness of sentence-level alignment in enhancing factual precision. Practically, FactAlign contributes to the field of AI by offering a pathway to mitigate hallucinations in LLM outputs, thereby broadening their applicability in real-world settings where factual accuracy is non-negotiable.

Future explorations may explore extending FactAlign's capabilities by integrating broader knowledge bases or real-time web data, potentially enriching the factual context within which LLM outputs are evaluated. Additionally, further refinement in automatic factuality metrics could enhance the granularity and reliability of factual assessments in dynamic and evolving knowledge domains.

In conclusion, the FactAlign framework establishes a robust foundation for future work aimed at reconciling the dual objectives of informativeness and factual accuracy in LLMs, paving the way for more reliable AI-driven communication tools in sensitive applications.