Analysis of RNN hidden state mismatch during waypoint servoing

Analyze the discrepancy in recurrent policy hidden states caused by using a closed-loop waypoint controller during sparse periods at test time while training on human-demonstrated trajectories, and determine its impact on HYDRA’s performance and stability.

Background

During test-time sparse periods, HYDRA employs a closed-loop waypoint controller to reach predicted waypoints while still querying the policy to update the recurrent hidden state. Because training trajectories may follow non-optimal human paths while test-time execution follows controller-generated paths, the recurrent hidden states can differ between training and deployment.

Although the authors did not observe empirical failures from this mismatch, they identify the need for a broader analysis to understand when and how hidden-state discrepancies arise and whether they affect policy behavior.

References

We leave a broader analysis of the hidden state problem for future work.

HYDRA: Hybrid Robot Actions for Imitation Learning  (2306.17237 - Belkhale et al., 2023) in Appendix, Section “Labeling Modes in HYDRA,” Subsection “Waypoint Controller”