Comparison of different Unique hard attention transformer models by the formal languages they can recognize

Published 3 Jun 2025 in cs.LG, cs.CL, and cs.FL | (2506.03370v1)

Abstract: This note is a survey of various results on the capabilities of unique hard attention transformers encoders (UHATs) to recognize formal languages. We distinguish between masked vs. non-masked, finite vs. infinite image and general vs. bilinear attention score functions. We recall some relations between these models, as well as a lower bound in terms of first-order logic and an upper bound in terms of circuit complexity.