Extend the framework beyond linear reward decompositions

Develop and analyze extensions of the framework to reward functions that do not decompose linearly into a sum of a Chain-of-Thought-only term R_cot and an output-only term R_out, including cases with interactions between CoT and final output.

Background

The formalism assumes that total reward decomposes into a linear sum of a CoT-dependent term and an outcome-dependent term. This facilitates the aligned/orthogonal/in-conflict classification but restricts applicability.

The authors explicitly leave the treatment of other decompositions for future work, indicating a concrete extension path for the framework.

References

We consider this case where the two rewards act separately and linearly on the CoT and Final Output, leaving other reward decompositions for future work.

— Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought? (2603.30036 - Kaufmann et al., 31 Mar 2026) in Appendix: Mathematical Model of Aligned / In-Conflict / Orthogonal, Reward decomposition (R_cot and R_out)

Extend the framework beyond linear reward decompositions

Background

References

Related Problems