Developing input–output dimension-agnostic action models

Develop action-model architectures for in-context reinforcement learning that are agnostic to the dimensionality of both the observation (input) and action (output) spaces, so that a single model can transfer to tasks with previously unseen observation and action shapes across domains without requiring task‑ or group‑specific encoders or decoders.

Background

The presented Vintix II model partitions tasks into groups sharing identical observation and action structures, with group-specific encoders and decoders. While this supports broad cross-domain training, it constrains transfer to tasks whose observation and action dimensionalities were not seen during training.

During evaluation, tasks with novel input–output dimensionality (e.g., certain Bi‑DexHands configurations) could not be incorporated under the current multi-head encoder architecture, underscoring a limitation in deploying a single agent across arbitrary new domains. The authors explicitly identify eliminating this limitation—by making action models dimension-agnostic—as an open challenge to enable transfer to entirely unseen domains.

References

In addition, the challenge of developing action models that are agnostic to the input-output dimension remains open, restricting current models from transferring to entirely unseen domains and limiting their applicability in practical scenarios.

— Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner (2604.05112 - Polubarov et al., 6 Apr 2026) in Conclusion

Developing input–output dimension-agnostic action models

Background

References

Related Problems