Tightness of padding bounds for transformer recognition of context-free languages
Determine whether the current padding-token upper bounds for transformer-based recognition of context-free languages are tight. Specifically, ascertain the minimal number of padding tokens required for averaging hard-attention transformers with logarithmically looped layers and log-precision arithmetic to recognize (i) all context-free languages using O(n^6) padding, (ii) unambiguous context-free languages using O(n^3) padding, and (iii) unambiguous linear context-free languages using O(n^2) padding; either establish matching lower bounds or construct recognition algorithms that use asymptotically fewer padding tokens for these classes.
References
Although it is not possible to improve our $\log$-depth recognition algorithm to fixed depth unless ${0} = {1}$, our padding bounds are not known to be tight.