State-Industry Bipartite Network

Updated 26 January 2026

State-industry bipartite network is a two-mode graph connecting states and industries, capturing economic activities such as investments and industrial presence.
The analytical framework employs methods like block modeling, mixed-membership models, and entropy-based null models to reveal clustering and influence patterns.
Key metrics including diversification, ubiquity, and economic complexity indices provide quantitative insights for evaluating state-level industrial performance and guiding policy.

A state-industry bipartite network is a two-mode graph where one node set represents geographic or administrative states and the other represents industries, with edges indicating a specified relationship—most commonly the presence of economic activity, investment, or establishment in a given industry by a given state. This structure underlies many empirical studies of economic complexity, industrial diversification, network clustering, and longitudinal influence. The mathematical framework, algorithmic procedures, and inferential techniques for analyzing state-industry bipartite networks draw from optimal blockmodeling, mixed-membership stochastic blockmodels, entropy-based null models, economic complexity indices, longitudinal influence models, and projected network theory.

1. Construction and Mathematical Framework

A state-industry bipartite network consists of two disjoint node sets: states $S = \{s_1, \ldots, s_n\}$ and industries $I = \{i_1, \ldots, i_m\}$ , with the bipartite edge set $E \subseteq S \times I$ . The network's biadjacency matrix $B \in \mathbb{R}^{n \times m}$ is defined by $B_{si}=1$ if state $s$ is linked to industry $i$ by a chosen criterion.

Empirical specification varies depending on the research context:

Firm-level capital aggregation: $x_{sp}$ , the paid-up capital of all firms from state $s$ in industry $p$ (Thomas et al., 18 Jan 2026).
Revealed Comparative Advantage (RCA): $RCA_{sp}$ is computed as $x_{sp} / \sum_{p'} x_{sp'}$ divided by $\left(\sum_{s'} x_{s'p}\right) / \sum_{s',p'} x_{s'p'}$ , binarized as $M_{sp}=1$ for $RCA_{sp}\ge1$ (Thomas et al., 18 Jan 2026).
Time-evolving investment flows, cosponsorship, or other relational quantities may be captured by $Y_t \in \mathbb{R}^{S \times I}$ where $t$ indexes time (Marrs et al., 2018).

A formal bipartite stochastic block model (BSBM) parametrizes the link probability between state $i$ and industry $j$ according to their latent block memberships, i.e., $A_{ij} \sim \mathrm{Bernoulli}(P_{y_i, z_j})$ for unknown labels $y_i \in \{1, \dots, K\}$ , $z_j \in \{1, \dots, L\}$ (Zhou et al., 2018).

2. Network Metrics: Diversity, Ubiquity, and Complexity

Key metrics derived from the binary biadjacency matrix $M_{sp}$ include:

Diversification ( $k_{s,0}$ ): Number of industries where state $s$ is competitive, $k_{s,0} = \sum_p M_{sp}$ (Thomas et al., 18 Jan 2026).
Ubiquity ( $k_{p,0}$ ): Number of states active in industry $p$ , $k_{p,0} = \sum_s M_{sp}$ (Thomas et al., 18 Jan 2026).
Economic Complexity Index (ECI): Obtain similarity matrix $\widetilde M_{ss'} = \sum_p M_{sp} M_{s'p} / (k_{s,0} k_{p,0})$ ; solve for principal eigenvector $K_s$ , and standardize to $ECI_s$ (Thomas et al., 18 Jan 2026).
Fitness–Complexity Algorithm: Iteratively, $F_s^{(n)} = \sum_p M_{sp} Q_p^{(n-1)}$ and $Q_p^{(n)} = \left( \sum_s M_{sp} / F_s^{(n-1)} \right)^{-1}$ , normalizing after each step (Thomas et al., 18 Jan 2026).

Empirical findings indicate high complexity correlates strongly with per-capita gross state product (Thomas et al., 18 Jan 2026).

3. Clustering, Biclustering and Block Models

Community detection and latent group inference in state-industry bipartite networks employ:

Spectral Initialization: Compute truncated SVD of $A$ to obtain $\hat U_r$ , $\hat V_r$ , then apply $K$ -means to rows for initial block labels $\tilde y$ , $\tilde z$ (Zhou et al., 2018).
Pseudo-Likelihood EM Updates: Aggregate block-compressed counts, estimate Poisson means $\hat\Lambda_{k\ell}$ , and class priors $\hat\pi_k$ , then maximize log pseudo-likelihood for label refinement (Zhou et al., 2018).
Sub-block Partitioning: Partition nodes into $2Q \times Q$ sub-blocks for provable independence and distributed implementation (Zhou et al., 2018).

Optimal bipartite clustering achieves weak consistency—misclassification rate converges to zero under cluster-balance, degree-sparsity, and block-separability conditions. The two-pass PL refinement yields exponential-rate guarantees and is minimax optimal up to logarithmic factors (Zhou et al., 2018).

Mixed-membership blockmodels (biMMSBM) further capture group overlap and covariate effects; latent membership vectors are parameterized using Dirichlet priors modulated by node-level covariates, while dyad-level covariates modulate edge formation via logistic affinity matrix $B_{gh}$ (Lo et al., 2023). Variational EM and stochastic inference allow for scalable fitting and model selection based on predictive likelihood or ROC-AUC.

4. Projected Networks and Structural Properties

The one-mode projection compresses the bipartite network onto states:

Unweighted Projection: $A_S = \mathrm{sign}^+(B B^T)$ , $a_{u,v}=1$ if states $u, v$ share at least one industry (Banerjee et al., 2017).
Weighted Projection: $W_{u,v} = \sum_k B_{u,k} B_{v,k}$ , the number of shared industries (Banerjee et al., 2017).

Theoretical properties include:

Cliques: If an industry connects to $d_k \geq 2$ states, those states form a clique in $G_S$ (Banerjee et al., 2017).
Connectedness: G_S can be disconnected if there exist pendant states or industries with degree one (Banerjee et al., 2017).
Edge Weight Bounds: $1 \leq w_{u,v} \leq |I|$ for $w_{u,v}>0$ (Banerjee et al., 2017).
Total Weight: $\sum_{u<v}w_{u,v} = \sum_{k\in I} (|N(k)| \choose 2)$ (Banerjee et al., 2017).

Edge weights in the projection serve as interpretable measures of economic similarity between states; thresholding or backbone extraction clarifies strong regional alignments.

5. Entropy-Based Null Models and Pattern Testing

Entropy-based models provide null hypotheses for structural features such as nestedness and motif statistics:

Canonical Ensemble: Model given expected state and industry degrees with Hamiltonian $H(G)=\sum_{s,i}(\theta_s+\phi_i)m_{si}$ , yielding independent Bernoulli link probabilities $p_{si}$ (Straka et al., 2017).
Calibration: Fit multipliers $x_s$ , $y_i$ via iterative proportional fitting or Newton–Raphson to match observed degree sequences (Straka et al., 2017).
Structural Pattern Testing: Compute expected nestedness (NODF), checkerboards, or modularity under the null; compare observed values via $z$ -scores and Gaussian approximations for hypothesis testing (Straka et al., 2017).
Validated Projections: Assess significance of state–state similarity links via Poisson-binomial tests on shared industries (Straka et al., 2017).

The observed triangular structure in state-industry matrices, with highly diversified states occupying ubiquitous and specialized industries and low-diversification states restricted to basic sectors, is validated as a robust feature of capability accumulation (Thomas et al., 18 Jan 2026).

6. Longitudinal Analysis and Influence Networks

For temporal or panel state-industry data, the Bipartite Longitudinal Influence Network (BLIN) model characterizes dynamic dependencies:

Model Specification: $Y_t = A^T X_t + Z_t B + E_t$ with $A$ and $B$ as influence networks among states and industries, respectively (Marrs et al., 2018).
Estimation: Alternating least squares, Lasso regularization for sparsity, or reduced-rank decomposition; $A$ and $B$ estimated so as to minimize $\sum_t \| Y_t - A^T X_t - Z_t B \|_F^2$ (Marrs et al., 2018).
Theoretical Properties: Identifiability up to diagonal shifts; consistency under correct specification and stationarity (Marrs et al., 2018).
Interpretation: Influences recover dynamic economic leadership or imitation among states and industries over specified lags.

This approach enables direct inference on the evolution of inter-state and inter-industry influence, avoiding projection artifacts and accommodating measurement uncertainty.

7. Policy, Application, and Software Implementation

State-industry bipartite networks have direct policy and analytical applications:

Quantitative regional benchmarking using complexity indices and fitness scores (Thomas et al., 18 Jan 2026).
Capability-driven industrial policy formulation to move states up the triangular hierarchy by addressing basic infrastructure and skill gaps before diversification (Thomas et al., 18 Jan 2026).
Model-based cluster assignment and prediction using R packages like NetMix for biMMSBM fitting, including covariate incorporation, simulation-based validation, and standard-error estimation (Lo et al., 2023).
Null model calculation and motif-testing with generic Python and R code (iterative proportional fitting for configuration model calibration) (Straka et al., 2017).
Projected network interpretation for mapping state similarities, strategy development, and backbone extraction (Banerjee et al., 2017).

Empirical studies confirm 11.2% annual exponential growth of Indian firms, and clear positive correlations between complexity metrics derived from bipartite structure and regional income (Thomas et al., 18 Jan 2026). This approach is validated for contexts where national or regional industrial capacity, strategic economic alliances, or innovation policies require precise multi-level network analysis.

For large-scale empirical application, scalability of entropy-based fitting and inference algorithms, robust biclustering, regularization, and variational EM procedures are implemented in contemporary software ecosystems (Lo et al., 2023, Straka et al., 2017).

References