Neural AMR: Sequence-to-Sequence Models

Updated 12 February 2026

Neural AMR models are deep sequence-to-sequence frameworks that linearize AMR graphs for both parsing and generation, incorporating advanced neural architectures.
They employ various linearization strategies, such as Penman and triple-based encodings, to convert complex graph structures into manageable token sequences.
Structure-aware modifications and transition-based systems improve the handling of reentrancies and multi-hop relations, driving state-of-the-art performance.

Neural sequence-to-sequence (seq2seq) models constitute the dominant paradigm for mapping between natural language and Abstract Meaning Representation (AMR) graphs. In these approaches, both parsing (text → AMR) and generation (AMR → text) are reduced to conditional sequence generation via deep neural architectures, typically using recurrent or Transformer-based encoder–decoders with attention. Since AMR is a rooted, directed acyclic graph formalism, the graph must be linearized for compatibility with standard seq2seq architectures, leading to a spectrum of linearization strategies and associated inductive biases. Recent research evaluates the trade-offs of various graph serialization methods, advanced pretraining, data augmentation, and structure-aware modifications, achieving state-of-the-art performance on both parsing and generation tasks.

1. Linearization Approaches for AMR Graphs

Seq2seq models require AMR graphs to be serialized as string sequences. The Penman linearization, based on nested parentheses, is the standard:

Each node is rendered as (var / concept), and edges appear as :role child-subtree in preorder. The full graph is a tree of parentheses, forming a single rooted structure.
Reentrancy (multi-parent nodes) is encoded by duplicating the child subtree and introducing inverse roles (r-of), doubling the relation inventory. For $|R|$ base roles, Penman requires $2|R|$ types ( $|R|$ direct, $|R|$ inverse) (Kang et al., 13 May 2025).
Variables encode pointer identity and preserve cross-graph references.

Alternative approaches, such as triple-based encoding, flatten the AMR graph into a sequence of (parent, relation, child) triples, concatenated with a delimiter (e.g., "|"). This strategy eliminates the need for inverse roles (by representing all arcs in direct form only) and maintains adjacency between parent and child. However, it produces markedly longer sequences, increasing output sparsity and reducing the model's ability to infer hierarchical scope (Kang et al., 13 May 2025).

Variants exist for both Penman and triple formats, toggling the presence or absence of variables and inverse roles. Dropping variables decreases output sparsity but impedes recovery of coreference structure.

2. Seq2Seq Architecture, Data, and Preprocessing

The canonical architectures are attention-augmented encoder–decoders:

Early models use bidirectional LSTM (BiLSTM) encoders and uni-directional LSTM decoders with global or Bahdanau attention. Character-level models operate over an input/output vocabulary of 150–200 symbols and directly model the character stream of linearized AMRs, exploiting "super-characters" for multi-character sequences (e.g., ":ARG0") to reduce output length and increase symbol discriminability (Noord et al., 2017, Noord et al., 2017).
Contemporary systems adopt pre-trained Transformers (e.g., mBART-large with 12 layers, cross-attention), tokenizing both sentences and AMR sequences via subword segmentation (e.g., SentencePiece or BPE, typically with 20–50k merges) (Kang et al., 13 May 2025, Xu et al., 2020). Variables and inverse roles are optionally retained or suppressed at preprocessing time.

Data augmentation is standard: AMR sibling orderings are permuted to match sentence word order, expanding the effective training data. Further, large volumes of "silver-standard" AMR–sentence pairs produced by automatic parsers on external corpora (e.g., Gigaword or Groningen Meaning Bank) are incorporated to mitigate data sparsity (Noord et al., 2017, Konstas et al., 2017).

For sparsity reduction, infrequent AMR tokens are replaced by coarse-grained categories (anonymization)—DATE, NE_type, -VERB-, -RET-, etc.—with instance indices to preserve compositionality. At inference, mappings are recovered by lookups built during training (Peng et al., 2017).

3. Structure-Aware Modeling and Transition-Based Systems

To overcome the limitations of flat linearizations, several works introduce structure-aware or transition-based mechanisms:

Structure-aware Transformers augment vanilla self-attention by integrating path-aware vectors $r_{ij}$ computed from the AMR-graph path between token pairs. $r_{ij}$ is injected into the key and value projections, encoding multi-hop semantic relations and reentrancies directly in attention scores. This yields gains of 4–6 BLEU over baselines on AMR-to-text generation, with ablations confirming multi-hop paths contribute ≈2 BLEU (Zhu et al., 2019).
Recent transition-based approaches encode parser states as sequences of discrete actions (e.g., SHIFT, COPY, NODE(label), arc-attachment ops, ROOT), enforcing graph well-formedness by design. These systems fine-tune pre-trained encoder–decoder models (e.g., BART) on oracle action sequences, using hard-masked cross-attention to the sentence cursor and a dedicated self-attention head for pointer-based arc targeting. The decoder’s hidden state encodes the action stack and buffer non-parametrically (Zhou et al., 2021).
Losses sum token-level cross-entropy over actions and (where applicable) pointer indices. This design guarantees graph validity and allows arc reentrancy to be captured structurally within the seq2seq framework.

4. Empirical Evaluation and Results

AMR parsing efficacy is primarily benchmarked by SMATCH F1 scores; text generation uses BLEU:

Linearization/Model	AMR3.0 SMATCH	AMR2.0 SMATCH	LDC2017T10 BLEU
Penman_O_var_O_invrole (full Penman)	80.9 ± 0.2	—	—
Triple_O_var_X_invrole (drop inverses)	80.3 ± 0.1	—	—
StructBART-J transition-based	—	84.2 ± 0.1	—
Seq2Seq Transformer pre-trained (MTL)	—	80.2	—
Structure-aware Transformer (CNN)	—	—	31.82
Baseline char-level seq2seq	—	—	—

Penman outperforms triple-based encodings by ≈0.5 SMATCH even on deep graphs, with the gap widening as sequence length increases due to the verbosity and loss of grouping information in the triple encoding. Structure-aware augmentations robustly improve model performance, especially for handling reentrancies and multi-hop relations, as evidenced by performance breakdowns showing consistent improvements in reentrancy, argument, and coreference-oriented categories (Kang et al., 13 May 2025, Zhu et al., 2019, Zhou et al., 2021).

Transition-based systems reach state-of-the-art on AMR 2.0 (e.g., SMATCH 84.2–84.7), without the need for complex recategorization or external structural constraints (Zhou et al., 2021). Pretraining on machine translation, syntactic parsing, and silver-parsed AMRs produces strong initializations for both parsing and generation, with multi-task fine-tuning yielding further gains (Xu et al., 2020).

5. Qualitative Analysis, Error Patterns, and Limitations

Model analyses consistently identify several bottlenecks:

Penman’s nested format places semantically related nodes at significant linear distances in cases of intervening deep subtrees, complicating learning of local dependencies.
Triple-based linearization, while adjacency-preserving, is degraded by sequence verbosity and the lack of explicit subgraph boundaries, obscuring nested logical structure (Kang et al., 13 May 2025).
Character-level seq2seq models, despite strong inductive biases, are susceptible to output ill-formedness (e.g., parenthesis mismatches), and lag in precision/recall for long input sequences unless postprocessing (e.g., bracket correction, wikification) is applied (Noord et al., 2017).
Most linearization-based approaches fail to uniquely signal reentrancy: tree-based DFS encodings simply duplicate reentrant nodes. Graph-structured or structure-aware encoders recover this by architectural means (e.g., GCN layers or attention path biasing) (Damonte et al., 2019, Zhu et al., 2019).
Data sparsity remains acute; anonymization and aggressive silver-data augmentation are standard and critical (Peng et al., 2017, Konstas et al., 2017).
For transition-based models, action sequence errors and pointer mispredictions can still propagate, but gold action oracle decoding ensures output well-formedness (Zhou et al., 2021).

6. Directions for Improvement and Research Opportunities

Progress over the last five years suggests several potential enhancements:

Hybrid graph encodings: Introducing pseudo-brackets or hierarchical signals into triple-based serializations to better capture nested structure, while exploiting their adjacency fidelity (Kang et al., 13 May 2025).
Enhanced pretraining: Scalable, multi-task pretraining on diverse syntactic, semantic, and cross-lingual corpora, followed by careful multi-task fine-tuning to preserve knowledge (Xu et al., 2020).
Explicit structural supervision: Integrating relation-type hierarchies or distance-aware embeddings to recover structural cues lost in fully flattened encodings (Kang et al., 13 May 2025).
Parser–generator integration: Joint modeling of parsing and generation, possibly with shared representations or paired self-training (Konstas et al., 2017).
Improved anonymization and pointer mechanisms: More robust strategies for rare concept handling and variable re-insertion, reducing reliance on token-level heuristics (Peng et al., 2017, Noord et al., 2017).

This suggests that future neural AMR models will likely combine explicit graph linearizations for neighborhood precision with concise, structure-preserving signals to maximize seq2seq model comprehension while leveraging large-scale pretraining and data augmentation.

7. Summary of Key Design Insights

Seq2seq models for neural AMR parsing and generation have advanced through iterative improvements to AMR graph serialization, model architecture, data augmentation, and structure awareness:

Penman’s concise, bracketed structure remains optimal for single-task settings, while triple-based adjacency is promising for hybrid schemes (Kang et al., 13 May 2025).
Data-driven anonymization, silver data, and multi-source pretraining have dramatically closed the gap between plain seq2seq and graph-native or feature-rich systems (Xu et al., 2020, Konstas et al., 2017).
Structure-aware modifications (e.g., path-injected attention, transition-based decoding) enable seq2seq models to natively capture reentrancies and non-tree structure, pushing state-of-the-art performance without specialized graph-based modules (Zhou et al., 2021, Zhu et al., 2019).
Ongoing challenges include efficiently representing long, deeply nested graphs, robustly training on limited gold data, and enhancing output well-formedness while accommodating the full diversity of AMR semantics.