Practical construction of efficient, safe, and explainable Large Action Models without massive end-to-end training

Determine whether it is possible to construct practical Large Action Models (LAMs) that are computationally efficient, safe, and explainable without the resource-intensive training of massive new sequence models.

Background

Large Action Models aim to extend LLMs to encompass perception, reasoning, and action, but monolithic Transformer-based approaches suffer from quadratic inference complexity and risks of action hallucinations, which pose safety hazards in robotics.

Motivated by these limitations, the paper explores whether modular architectures that compose pre-trained perception models with symbolic verification layers can achieve LAM capabilities without extensive end-to-end training, framing this as a central open question for practical, safe, and interpretable deployment.

References

Consequently, a major open question is whether it is possible to construct practical LAMs that are computationally efficient, safe, and explainable without the resource-intensive training of massive new sequence models.

Architecting Large Action Models for Human-in-the-Loop Intelligent Robots  (2512.11620 - Sangchai et al., 12 Dec 2025) in Section 1, Introduction