Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fine-Tuning LLMs with Fine-Grained Human Feedback on Text Spans

Published 29 Dec 2025 in cs.CL | (2512.23693v1)

Abstract: We present a method and dataset for fine-tuning LLMs with preference supervision using feedback-driven improvement chains. Given a model response, an annotator provides fine-grained feedback by marking liked'' anddisliked'' spans and specifying what they liked or disliked about them. The base model then rewrites the disliked spans accordingly, proceeding from left to right, forming a sequence of incremental improvements. We construct preference pairs for direct alignment from each adjacent step in the chain, enabling the model to learn from localized, targeted edits. We find that our approach outperforms direct alignment methods based on standard A/B preference ranking or full contrastive rewrites, demonstrating that structured, revision-based supervision leads to more efficient and effective preference tuning.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.