Observation-adaptive agents for the “use-when-useful” sensing regime

Characterize and develop reinforcement learning agents that operate in the “use-when-useful” sensing regime, where high-dimensional observations are available but may be selectively ignored, by formalizing when an agent should rely primarily on internal state and open-loop structure and when it should incorporate exteroceptive sensory inputs (e.g., dense flow-field estimates) during both training and execution.

Background

The paper shows that rich exteroceptive flow feedback can be essential for learning high-performance policies in a chaotic fluid system, even though the resulting policies can later be executed in a largely open-loop fashion with minimal or no feedback. This motivates agents that can flexibly decide when to use or ignore available observations rather than relying on fixed sensor sets determined by designers.

Within this context, the authors highlight a regime where observations arrive anyway, but the agent may benefit from selectively ignoring them when uninformative, noisy, or computationally burdensome, and emphasizing them when they materially improve control. They note that this “use-when-useful” setting has not been systematically studied, calling for methods and analyses that enable such observation adaptivity.

References

To our knowledge, this 'use-when-useful' regime remains largely unexplored.

— Using reinforcement learning to probe the role of feedback in skill acquisition (2512.08463 - Terpin et al., 9 Dec 2025) in Section 4.1, Privileged information and observation-adaptive agents

Observation-adaptive agents for the “use-when-useful” sensing regime

Background

References

Related Problems