Extend task attribution to planning abilities in language models

Extend the kernel surrogate model-based task attribution framework to examine planning abilities in language models, determining how the framework can quantify the influence of individual training tasks on planning-related behaviors rather than only prediction tasks.

Background

The paper introduces kernel surrogate models and an efficient gradient-based estimation procedure to perform task attribution, primarily evaluating prediction tasks across modular arithmetic, in-context learning, and multi-objective reinforcement learning. While the approach demonstrates strong empirical performance and theoretical connections to influence functions and linear surrogates, its evaluation focuses on prediction-oriented settings.

The authors note that planning capabilities are a fundamental aspect of human intelligence and are increasingly relevant for modern LLMs. They explicitly state that extending their attribution framework to planning abilities remains an open question, highlighting a gap between current attribution methods and the analysis of planning behaviors. Addressing this problem would broaden the applicability of task attribution beyond predictive tasks and could potentially clarify generalization differences between supervised fine-tuning and reinforcement learning.

References

Extending our attribution framework to examine other aspects of LLMs such as planning abilities remains an open question, which is a fundamental aspect of human intelligence \citep{wang2024alpine}.

Efficient Estimation of Kernel Surrogate Models for Task Attribution  (2602.03783 - Zhang et al., 3 Feb 2026) in Related Work (end of section)