Papers
Topics
Authors
Recent
Search
2000 character limit reached

Autoregressive Schema Generation

Updated 2 February 2026
  • Autoregressive schema generation is a probabilistic approach that sequentially constructs structured objects such as graphs and database schemas.
  • The LO-ARM model introduces a dynamic, data-dependent order-policy that adapts the generation sequence to enhance sample fidelity and diversity.
  • Quantitative evaluations on benchmarks like QM9 and ZINC250k demonstrate significant improvements in validity, uniqueness, and reduced negative log-likelihood.

Autoregressive schema generation encompasses probabilistic models that sequentially construct structured objects such as graphs, database schemas, and knowledge-graph triples by iteratively selecting both what to generate and in which order. Unlike standard autoregressive models, which impose a fixed ordering, advanced models such as the Learning-Order Autoregressive Model (LO-ARM) introduce a dynamic, data-dependent order-policy that adapts the generation order at each step, unlocking improved sample fidelity and diversity for high-dimensional domains lacking a natural canonical ordering (Wang et al., 7 Mar 2025).

1. Factorization and Order-Policy Design

Classic autoregressive factorization decomposes p(x)p(\mathbf{x}) for a discrete object x=(x1,...,xL)\mathbf{x} = (x_1, ..., x_L) by choosing a permutation σSL\sigma \in S_L and modeling

pϕ(xσ)=i=1Lpϕ(xσixσ<i)p_{\phi}(\mathbf{x}\mid\sigma) = \prod_{i=1}^{L} p_{\phi}(x_{\sigma_{i}} \mid x_{\sigma_{<i}})

where σ<i=(σ1,...,σi1)\sigma_{<i} = (\sigma_1, ..., \sigma_{i-1}). This rigid ordering is effective for sequential data but ill-suited for structured objects such as graphs or schemas. LO-ARM generalizes this approach by introducing a latent permutation z=(z1,...,zL)z = (z_1, ..., z_L), drawn sequentially from an order-policy pθ(ziz<i,xz<i)p_{\theta}(z_i \mid z_{<i}, x_{z_{<i}}), so that

pθ,ϕ(x,z)=i=1Lpθ(ziz<i,xz<i)  pϕ(xzixz<i)p_{\theta,\phi}(\mathbf{x}, z) = \prod_{i=1}^L p_{\theta}(z_i \mid z_{<i}, x_{z_{<i}}) \; p_{\phi}(x_{z_{i}} \mid x_{z_{<i}})

with the marginal likelihood pθ,ϕ(x)p_{\theta,\phi}(\mathbf{x}) obtained by summing over all L!L! possible orderings.

The trainable order-policy πθ(zx)\pi_{\theta}(z\mid\mathbf{x}) adapts to the partial state, represented by

xˉz<i=[xz1,,xzi1,MASK,,MASK]\bar{x}_{z_{<i}} = [x_{z_1}, \dots, x_{z_{i-1}}, \mathsf{MASK}, \dots, \mathsf{MASK}]

A typical parametrization ("shared-torso") sets

pθ(zi=kz<i,xz<i)=exp(hθ,k(xˉz<i))kz<iexp(hθ,k(xˉz<i))p_{\theta}(z_i=k \mid z_{<i}, x_{z_{<i}}) = \frac{\exp(h_{\theta,k}(\bar{x}_{z_{<i}}))}{\sum_{k' \notin z_{<i}} \exp(h_{\theta,k'}(\bar{x}_{z_{<i}}))}

where hθ,kh_{\theta,k} is a per-dimension output head. Alternative entropy-based variants use entropy of the classifier's predictive distribution as logits.

2. Learning via Variational Inference and Gradient Estimation

Optimizing pθ,ϕ(x)p_{\theta,\phi}(\mathbf{x}) is intractable, necessitating amortized variational inference with

qψ(zx)=iqψ(ziz<i,x)q_{\psi}(z \mid \mathbf{x}) = \prod_i q_{\psi}(z_i \mid z_{<i}, \mathbf{x})

Standard importance-sampling yields a stochastic lower bound

logpθ,ϕ(x)Ezqψ[logpθ,ϕ(x,z)logqψ(zx)]\log p_{\theta,\phi}(\mathbf{x}) \geq \mathbb{E}_{z \sim q_{\psi}} \left[ \log p_{\theta,\phi}(\mathbf{x}, z) - \log q_{\psi}(z|\mathbf{x}) \right]

Expanding both policy and classifier yields the ELBO: L(θ,ϕ,ψ)=Ezqψ[i=1Llogpθ(ziz<i,xz<i)+logpϕ(xzixz<i)logqψ(ziz<i,x)]\mathcal{L}(\theta, \phi, \psi) = \mathbb{E}_{z \sim q_{\psi}} \left[ \sum_{i=1}^{L} \log p_{\theta}(z_{i}|z_{<i}, x_{z_{<i}}) + \log p_{\phi}(x_{z_{i}}|x_{z_{<i}}) - \log q_{\psi}(z_{i}|z_{<i}, \mathbf{x}) \right]

Classifier parameters ϕ\phi update via standard softmax-cross-entropy; order-policy and variational parameters (θ,ψ)(\theta, \psi) update through REINFORCE policy-gradient using a leave-one-out baseline (RLOO). Gradients are estimated with pairs of sampled paths, a uniformly chosen step index, and expectations of functionals Fθ,ϕ(z<i,x)F_{\theta,\phi}(z_{<i},\mathbf{x}), enabling unbiased and efficient optimization.

3. Generation Workflow and Algorithm

Once trained, autoregressive schema generation operates as follows for LL slots:

  1. Initialize xˉ{MASK}L\bar{\mathbf{x}} \leftarrow \{\mathsf{MASK}\}^{L}, z<1=z_{<1} = \emptyset.
  2. For i=1i=1 to LL:
    • Sample ziz_i from pθ(ziz<i,xˉz<i)p_{\theta}(z_i|z_{<i}, \bar{\mathbf{x}}_{z_{<i}}).
    • Sample x^zi\hat{x}_{z_i} from pϕ(xzixˉz<i)p_{\phi}(x_{z_i}|\bar{\mathbf{x}}_{z_{<i}}).
    • Set xˉzix^zi\bar{\mathbf{x}}_{z_i} \leftarrow \hat{x}_{z_i}.
    • Update z<i+1z_{<i+1}.
  3. The output xˉ\bar{\mathbf{x}} is the generated schema.

The order-policy dynamically adapts, conditioning on the masked partial structure and previously emitted tokens, ensuring each step leverages prior context to augment generative coherence.

4. Quantitative Evaluation: Molecular Graph Generation

LO-ARM has demonstrated state-of-the-art performance on QM9 and ZINC250k molecular graph benchmarks (Wang et al., 7 Mar 2025). Each molecule is modeled as a graph with nn nodes (atoms), n×nn \times n adjacency encoding four bond types, padded to length L=n+n2L = n + n^2. The Graph Transformer backbone provides LL softmax heads for atom/bond prediction, and separate heads for order-policy and variational logits.

Table 1. QM9 Performance

Method NLL Validity% Uniqueness% FCD
AO-ARM (uniform) ≤24.7 98.9 99.1 0.67
LO-ARM (ent. & shared) ≤24.1 99.0 99.1 0.65
LO-ARM (st-torso & st-sep) ≤21.4 99.8 98.9 0.24

Table 2. ZINC250k Performance

Method NLL Validity% Uniqueness% FCD
AO-ARM (uniform) ≤80.2 32.9 100.0 6.54
Biased-AO-ARM (edge→node) ≤77.9 34.2 100.0 5.03
LO-ARM (st-torso & st-sep) ≤68.3 96.3 100.0 3.23
LO-ARM + Top-0.9 sampler 96.7 100.0 3.86

Learning context-dependent generation order significantly enhances sample quality (as measured by Fréchet ChemNet Distance), validity, and uniqueness, compared to fixed or uniform orderings. Ablation studies indicate entropy-based and shared-torso policies outperform uniform, with the highest-capacity variational network achieving optimal results.

5. Generalization to Diverse Schema Generation Tasks

The LO-ARM framework applies to any structured domain where the task involves filling discrete slots without intrinsic ordering, such as:

  • Database schema synthesis—the ordering of columns, tables, or constraints informed by the partial schema.
  • Knowledge-graph completion—sequential addition of entity or relation triples responding to the current connectivity.
  • Software-API call graphs—dynamic function-call ordering for program synthesis.

In all cases, learning a latent slot permutation zz and training LO-ARM to discover informative slot sequences enhances generative accuracy and coherence. A plausible implication is that learned order-policies may prioritize 'easier' slots or positions that confer greater downstream generative fidelity.

6. Limitations and Open Directions

Several challenges and open questions persist:

  • Scalability: The O(L)O(L) stepwise procedure becomes computationally prohibitive in high-dimensional settings (e.g., large images). Block-wise unmasking and chunk-level policies may offer improvements.
  • Gradient Variance: Discrete order sequence sampling (REINFORCE) can suffer high variance. Future work may explore control variates, continuous relaxations such as Gumbel-softmax, or stabilized training protocols.
  • Domain Constraints: Many real-world schemas require satisfaction of hard constraints; integrating constraint-aware policies or mixed continuous-discrete orderings remains unaddressed.
  • Adaptive Block Sizes: Automatically learning to unmask contiguous slot clusters (not only single slots) could accelerate sampling and training.

LO-ARM demonstrates the power of jointly learning what to generate and in which order, providing a likelihood-driven paradigm for autoregressive schema generation—an approach adaptable across any domain where canonical emission order is ambiguous or detrimental to sample quality (Wang et al., 7 Mar 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Autoregressive Schema Generation.