UMR Parsing & Generation Methods

Updated 10 February 2026

UMR parsing and generation methodologies are graph-based frameworks that map natural language to semantic representations and vice versa.
They incorporate fine-tuning of pretrained models, rule-based conversions, and hybrid strategies to optimize semantic fidelity and performance.
These approaches are crucial for multilingual processing, document-level understanding, and low-resource language applications in NLP.

Uniform Meaning Representation (UMR) Parsing and Generation comprise a suite of formal, algorithmic, and neural methodologies for transforming natural language sentences into UMR graphs and for generating natural language text from those graphs. UMR is a graph-based semantic formalism designed for broad multilingual and low-resource applicability, extending the scope of Abstract Meaning Representation (AMR) through document-level phenomena, cross-linguistic uniformity, and flexible annotation schemas. UMR parsing refers to the process of mapping sentences to rooted, labeled semantic graphs, while UMR generation addresses the inverse task. Research in this area includes deterministic, rule-based pipelines, neural fine-tuning of pretrained sequence-to-sequence models, and hybrid graph-completion strategies, all evaluated using metrics attuned to the structural properties and semantic fidelity of UMR graphs.

1. UMR Formalism and Notational Conventions

At the sentence level, a UMR graph $G$ is defined as $G = (V, E, L_V, L_E)$ where $V$ is a set of concept-labeled nodes (predicates, entities, frames), $E$ is a set of directed, labeled edges encoding semantic and grammatical roles (e.g., $:ARG0$ , $:aspect$ , $:refer-number$ ), and $L_V$ , $L_E$ are labeling functions. UMR graphs are typically linearized in the PENMAN notation, allowing sequence-based models to ingest and output them as flattened strings. For example:

$G = (V, E, L_V, L_E)$ 4 This formalism supports a uniform treatment of diverse language phenomena and is designed to accommodate additional annotation layers for document context and cross-language transfer (Markle et al., 8 Dec 2025).

2. Approaches to UMR Parsing

Several methodologies are established for constructing UMR graphs from sentences:

2.1 Fine-tuning AMR Parsers

State-of-the-art neural parsers originally trained for AMR can be fine-tuned on UMR data without architectural modifications. Models such as amrlib (T5-based), SPRING (BART-based dual-task parser/generator), BiBL (Bayesian multitask), LeakDistill (adapter-augmented), and AMRBART (large-scale BART) are exposed to UMR-annotated datasets, with the decoder trained directly to emit PENMAN-linearized UMR graphs. Training involves cross-entropy loss minimization with standard optimization hyperparameters (e.g., 10 epochs, AdamW optimizer, learning rate $4 \times 10^{-5}$ , batch size 16) (Markle et al., 8 Dec 2025).

2.2 Rule-based UD-to-UMR Pipelines

Another class leverages Universal Dependencies (UD) parses as linguistic scaffolding. The process involves deterministic conversion of a UD parse to a partial UMR graph—mapping lexical, syntactic, and morphosyntactic UD annotations to UMR node labels and semantic roles—followed by graph completion using a sequence-to-sequence model (typically T5), again trained with cross-entropy objective on UMR pairs (Markle et al., 8 Dec 2025).

2.3 Baseline and Hybrid Methods

Pipeline systems may cascade UD parsers with AMR parsers and conversion scripts, but limitations such as loss of UMR-specific relations and lower performance on genre- or domain-rich data (e.g., Minecraft dialogues) are noted. The best fine-tuned AMR models (notably BiBL) substantially outperform these approaches (AnCast F1 = 84.35, SMATCH++ = 90.98 for English) (Markle et al., 8 Dec 2025).

3. UMR Generation Methodologies

Text generation from UMR relies on three principal strategies (Markle et al., 17 Feb 2025):

3.1 Pipeline Conversion (UMR→AMR→Text)

A deterministic graph-to-graph conversion is performed to map UMR to AMR via rule-based procedures—dropping UMR-only relations, reassigning roles, and normalizing linguistic realizations (e.g., pronoun collapse). Off-the-shelf AMR-to-text sequence models (amrlib, SPRING, BiBL, Smelting) then produce textual renderings. The conversion step's quality, assessed via SMATCH (average 0.63 on English UMR $G = (V, E, L_V, L_E)$ 0AMR pairs), directly limits the final fluency and adequacy of text outputs.

3.2 Direct Fine-Tuning of Pretrained LLMs

Large-scale multilingual sequence-to-sequence models (mT5, mBART, Gemma-2B) are fine-tuned on PENMAN-linearized UMR $G = (V, E, L_V, L_E)$ 1text pairs. This approach circumvents any reliance on AMR-internal biases, but underutilizes the structured information encoded in UMR graphs and underperforms on adequacy and fluency, particularly in low-resource settings or when trained on multiple languages simultaneously ("curse of multilinguality").

3.3 Fine-Tuning AMR-to-Text Models on UMR Data

Existing AMR-to-text generation models are further fine-tuned directly on UMR graphs. This method achieves the highest quality across metrics (English BERTScore = 0.825, BLEU = 0.358, METEOR = 0.601; Chinese BERTScore = 0.882). Document-level training further enhances generation adequacy, capitalizing on UMR’s multi-sentence representational strengths (Markle et al., 17 Feb 2025).

Approach	English BERTScore	Chinese BERTScore
Pipeline (UMR→AMR→Text, BiBL)	0.784	0.767
Fine-tuned LLMs (mT5, mBART)	0.772/0.767	0.853/0.837
Fine-tuned AMR-to-Text (BiBL)	0.825	0.882

4. Evaluation Metrics for UMR Parsing and Generation

Evaluation of UMR parsing and generation emphasizes both structural and semantic faithfulness:

SMATCH (Cai & Knight, 2013): Graph triple overlap (Precision, Recall, F1) under optimal node alignment.
SMATCH++ (Opitz, 2023): Enhanced with finer alignment search and sub-scores (concept, role, attribute, coreference).
AnCast / AnCast++ (Sun & Xue, 2024/2025): Anchor-broadcast alignment F1 incorporating broader graph and document-level features.
BERTScore: Lexical similarity of output and reference via contextual embeddings,

$G = (V, E, L_V, L_E)$ 2

BLEU, METEOR: Standard n-gram-based generation metrics used to supplement graph-aware criteria (Markle et al., 17 Feb 2025, Markle et al., 8 Dec 2025).

For generation, COMET (neural reference-based similarity) is shown to correlate strongly with human semantic judgement, while BLEU does not, especially for English output (Wang et al., 2023).

5. Error Analysis and Limitations

Analysis of UMR parsing errors highlights:

Structural ill-formedness: Unbalanced parentheses or tokenization issues leading to invalid graph serialization.
Role and predicate errors: Wrong argument role, concept sense, or incomplete attachment, often due to out-of-vocabulary items.
Omissions and hallucinations: Generation models may omit fine-grained semantic modifiers or add spurious content, e.g., extraneous emoticon nodes or missing document-level influences.

UD-based graph completion approaches are prone to under-specifying modality or mislabeling arguments, while pipeline approaches lose UMR-specific compositionality through their coarse rule sets (Markle et al., 8 Dec 2025, Markle et al., 17 Feb 2025).

6. Extensions, Integrations, and Future Directions

Research avenues include developing larger and more diverse UMR-annotated corpora, especially for low-resource and indigenous languages, and exploring cross-lingual transfer via language-neutral UMR representations. Neuro-symbolic or neural UMR $G = (V, E, L_V, L_E)$ 3AMR conversion frameworks are proposed to preserve richer UMR detail in pipeline systems. Integrating parsing and generation in a multitask or dual-task setting, and developing evaluation metrics sensitive to UMR’s document-level scope, are active areas of investigation. Document- and discourse-level text generation from UMR, as well as tighter integration with world knowledge modules for semantic realization, represent further extensions (Markle et al., 17 Feb 2025).

7. Theoretical Perspectives: UMR, Parsers, and Generators

The formal foundations of UMR parsing and generation, particularly from the computational perspective, align parsing and generation as dual operations over free generators. The "free generator" construction formalizes applicative parsers and generators under a single algebraic structure, supporting both structured random generation and deterministic parsing. This approach enables factoring of arbitrary applicative generators into parsers over choice sequences—weakly unifying parsing and random generation in semantic modeling. It also leads to Brzozowski-style derivatives for exploring output spaces under constraints, and introduces gradient-based search methods for constrained structure generation, significantly improving sampling efficiency over naive rejection strategies (Goldstein et al., 2022).