Directed Acyclic Graphs (DAGs)

Updated 23 January 2026

Directed acyclic graphs (DAGs) are finite directed graphs with no cycles, used to encode conditional independence in probabilistic models.
They employ methodologies like topological ordering and smooth acyclicity constraints (e.g., NOTEARS) to facilitate scalable structure learning.
DAGs are widely applied in causal inference, scheduling, dynamic networks, and deep learning, demonstrating broad practical utility.

A directed acyclic graph (DAG) is a finite directed graph with no directed cycles—formally, a pair $G = (V, E)$ with $V$ a finite set of nodes and $E \subseteq V \times V$ a set of directed edges such that for any sequence $v_{i_1} \to v_{i_2} \to \cdots \to v_{i_k} = v_{i_1}$ , no such cycle exists. In the context of probabilistic graphical models, a DAG is used to encode conditional independence structure among random variables, enforcing that the global joint distribution factorizes as $P(X_1,\dots,X_p) = \prod_{j=1}^p P(X_j \mid X_{\mathrm{pa}(j)})$ , where $\mathrm{pa}(j)$ denotes the parent set of node $j$ in $G$ . DAGs are fundamental in causal inference, Bayesian networks, scheduling, biological network analysis, and a range of modern machine learning and optimization problems.

1. Foundational Concepts and Characterizations

A DAG admits several equivalent characterizations:

Acyclicity: No directed path starts and ends at the same node. Formally, the adjacency matrix $A$ is acyclic iff for all $k \geq 1$ , $V$ 0 (Wang et al., 2014).
Factorization: A probabilistic DAG model factorizes the joint as $V$ 1 (Zhou et al., 2021), ensuring that the conditional independence properties are compatible with the graph.
Markov Equivalence: Multiple DAGs may encode the same set of conditional independencies (giving rise to equivalence classes).
Topological Ordering and Layering: Any DAG admits a (not necessarily unique) topological ordering of nodes, and can also be decomposed into unique "topological layers" $V$ 2, with edges only flowing from lower to higher layers; this property underpins efficient learning algorithms by reducing search complexity especially in shallow graphs where $V$ 3 (Zhou et al., 2021).

2. Structure Learning and Optimization Approaches

Learning the DAG structure from observational data is a central and computationally challenging problem due to the super-exponential number of DAGs on $V$ 4 nodes.

Score-Based Methods and Integer Programming: The problem can be cast as penalized likelihood maximization (e.g., for linear SEMs: $V$ 5 subject to $V$ 6 being acyclic, where $V$ 7 is the coefficient matrix) (Manzour et al., 2019). The "Layered Network" (LN) MIQO formulation introduces integer variables and continuous "layer" variables to enforce acyclicity via inequalities on node layers, and proves empirically superior efficiency for sparse graphs, achieving optimality in practical regimes (Manzour et al., 2019).
Constraint-Based and Continuous Optimization: The NOTEARS framework encodes acyclicity as a smooth algebraic constraint ( $V$ 8) enabling continuous unconstrained optimization, while generalizations support dynamic graphs and additional structure (Fan et al., 2022).
Two-Stage Hybrid Algorithms: In the QVF-DAG setting (covering non-Gaussian exponential families with quadratic variance), TLDAG first constructs the unique topological layers (via a ratio test exploiting the variance-to-mean relationship) and then recovers directed edges via sparse regression (GLM or $V$ 9-regularized GLM), attaining $E \subseteq V \times V$ 0 complexity in shallow graphs (Zhou et al., 2021).
Bootstrap Aggregation: The DAGBag method generates an ensemble of DAGs on bootstrap resamples and aggregates them via median-graph selection under structural Hamming distance, substantially reducing false positives in high-dimensional—low-sample size settings by variance reduction (Wang et al., 2014).
Bayesian and Variational Inference: ProDAG introduces a projected variational inference approach, defining priors/posteriors with exact support on DAGs by projecting arbitrary continuous distributions onto the acyclic, sparse manifold via continuous relaxations (e.g., DAGMA/NOTEARS stylized acyclicity constraints), enabling both point estimation and uncertainty quantification (Thompson et al., 2024).

3. DAGs in Functional, Dynamic, and Generative Settings

Functional Data: MultiFun-DAG generalizes DAGs to the setting where each node is a multivariate functional object—vectors of real-valued functions—by formulating the inter-node relationships via bilinear function-to-function regression, reducing to finite-dimensional block-sparse linear SEMs after basis expansion. Structure learning is performed via a regularized EM algorithm under a NO-TEARS style acyclicity constraint, and identifiability, consistency, and convergence are rigorously established (Lan et al., 2024).
Dynamic Graphs: DAG structure learning for dynamic graphs involves separating intra-slice (instantaneous) and inter-slice (time-lagged) causal effects. GraphNOTEARS combines continuous score-based optimization and smooth acyclicity constraints to jointly estimate both contemporaneous and lagged edges, achieving superior F1 and SHD in simulations and real-world multi-slice data (Fan et al., 2022).
Sequence-Based Generative Models: Grammar-based approaches encode a DAG as a unique sequence of production rules under an unambiguous edge-directed graph grammar (edNCE), enabling lossless conversion between DAGs and sequences. This bijection supports generative modeling, property prediction, and Bayesian optimization within the sequence modeling paradigm, with guarantees of invertibility, unambiguity, and representational compactness (2505.22949).
Deep Reinforcement Learning: Deep Q-learning constructions for DAG generation define an action space over parent-set choices for each new node, with a graph-convolutional Q-network enforcing acyclicity by design via only allowing incoming edges to newly added nodes. The approach achieves perfect reconstruction of small, ground-truth DAGs and illustrates tractability limitations due to the $E \subseteq V \times V$ 1 action space growth (D'Arcy et al., 2019).

4. Algorithmic, Sampling, and Enumeration Aspects

Exact Counting and Sampling: The enumeration of DAGs, up to isomorphism or labeling, is an area of combinatorial importance. For classical labeled DAGs with $E \subseteq V \times V$ 2 nodes, the number $E \subseteq V \times V$ 3 grows super-exponentially. The directed ordered acyclic graph (DOAG) model enriches the classical definition by requiring a total order on sources and outgoing edges. Optimal anticipated rejection and recursive sampling algorithms achieve uniform sampling of (D)OAGs with a prescribed number of edges/sources in expected $E \subseteq V \times V$ 4 time, matching information-theoretic lower bounds (Pépin et al., 2023).
Layering and Topological Decomposition: Layered representations allow the reduction of the acyclicity constraint to constraints on layers or topological depth variables. This approach is key to the computational efficiency of certain learning algorithms, including layered MIQO for linear SEMs (Manzour et al., 2019) and TLDAG for QVF-DAGs (Zhou et al., 2021).

5. Extensions: Diffusion, Neural Modeling, and Practical Applications

DAG Diffusion Modeling: In information or epidemic spreading, DAGs provide a natural substrate for uni-directional (irreversible) diffusion processes. DAGs constructed via spectral embedding (e.g., with LOBPCG eigenvectors of a Laplacian-regularized manifold graph) admit a directional Laplacian with desirable spectral properties (real, nonnegative eigenvalues), allowing efficient computation of diffusion processes via matrix exponentials or ODE integration with provable convergence (Dinesh et al., 2023).
Graph Neural Networks and Transformers: Specialized DAG-adaptations of Transformer architectures leverage the partial order by restricting attention neighborhoods to ancestors and descendants (receptive field $E \subseteq V \times V$ 5), and encoding depth via sinusoidal positional encodings. This yields computationally efficient graph Transformers (linear complexity in sparse/DAG-structured graphs), improving empirical performance for source code analysis, neural architecture regression, and citation benchmarks compared to both GNNs and generic Transformers (Luo et al., 2022).
Causal and Scientific Applications: DAGs are instrumental across domains—e.g., in reconstructing causal relationships in NBA player statistics or e-commerce sales data (Zhou et al., 2021), modeling urban traffic congestion with multivariate functional nodes (Lan et al., 2024), and estimating causal flow in viral information propagation (Dinesh et al., 2023).

6. Computational Complexity and Empirical Performance

DAG learning suffers from computational intractability in the general case ( $E \subseteq V \times V$ 6-complete for structure enumeration), but layered, relaxation-based, and variational approaches enable scaling to hundreds or even thousands of nodes:

Method	Complexity (best-case)	Regime Best Suited For	Empirical Advantages
TLDAG (Zhou et al., 2021)	$E \subseteq V \times V$ 7 (shallow, $E \subseteq V \times V$ 8)	Non-Gaussian QVF DAGs	Low SHD, high F1, 20–25 $E \subseteq V \times V$ 9 faster than ODS/MRS
LN MIQO (Manzour et al., 2019)	$v_{i_1} \to v_{i_2} \to \cdots \to v_{i_k} = v_{i_1}$ 0 with sparse super-structure	Linear SEM, moderate $v_{i_1} \to v_{i_2} \to \cdots \to v_{i_k} = v_{i_1}$ 1	Optimal or near-optimal solution, effective for $v_{i_1} \to v_{i_2} \to \cdots \to v_{i_k} = v_{i_1}$ 2
DAGBag (Wang et al., 2014)	$v_{i_1} \to v_{i_2} \to \cdots \to v_{i_k} = v_{i_1}$ 3, $v_{i_1} \to v_{i_2} \to \cdots \to v_{i_k} = v_{i_1}$ 4=bootstraps, $v_{i_1} \to v_{i_2} \to \cdots \to v_{i_k} = v_{i_1}$ 5=search	High-dimensional, low-sample	Order of magnitude reduction in false positives
GraphNOTEARS (Fan et al., 2022)	$v_{i_1} \to v_{i_2} \to \cdots \to v_{i_k} = v_{i_1}$ 6 per iteration	Dynamic graphs, time series	Outperforms NOTEARS, DYNOTEARS on temporal/neighbor features
ProDAG (Thompson et al., 2024)	$v_{i_1} \to v_{i_2} \to \cdots \to v_{i_k} = v_{i_1}$ 7 per sample (practical)	Bayesian, uncertainty quant.	Best SHD/F1/AUROC on flow cytometry, robust small- $v_{i_1} \to v_{i_2} \to \cdots \to v_{i_k} = v_{i_1}$ 8

7. Theoretical Properties and Guarantees

Identifiability: Under mild conditions, certain classes of DAGs (e.g., QVF-DAGs) are fully identifiable from observational data (Zhou et al., 2021). MultiFun-DAG ensures identifiability up to orthogonal rotation under a functional basis (Lan et al., 2024).
Consistency and Error Bounds: Learning algorithms based on EM, score-based objectives with acyclicity constraints, or variational inference provide statistical guarantees, including consistency and high-probability error bounds for recovery of the true graph as $v_{i_1} \to v_{i_2} \to \cdots \to v_{i_k} = v_{i_1}$ 9 (Lan et al., 2024, Thompson et al., 2024).
Computational Scalability: Layer-based algorithms, projection onto the DAG manifold, and DAG-structured attention mechanisms exploit the acyclic property for scalability both in structure inference and graph representation learning (Luo et al., 2022, Zhou et al., 2021).

Directed acyclic graphs represent a central, unifying mathematical structure across probabilistic modeling, causal inference, optimization, sampling, and deep learning frameworks. The continual refinement of theory and algorithms—exploiting acyclicity via topological layering, exact algebraic constraints, and sequential generative grammars—ensures DAGs remain core to cutting-edge methodological advances and multi-domain scientific applications (Zhou et al., 2021, Wang et al., 2014, Manzour et al., 2019, Thompson et al., 2024, Pépin et al., 2023, Dinesh et al., 2023, Luo et al., 2022, Lan et al., 2024, D'Arcy et al., 2019, Fan et al., 2022, 2505.22949).