Co-Occurrence Graphs for Labor Markets
- Co‐Occurrence Graphs for Labor Markets are network representations that model joint occurrences of labor-market entities like occupations, skills, firms, and products using empirical data.
- They employ rigorous metrics such as proximity measures, Jaccard index, and centrality analyses to quantify opportunity flows, transitions, and diversification patterns.
- Applications include GDP and employment forecasting, targeted job training, and policy interventions, demonstrating their practical impact on workforce development.
Co‐occurrence graphs in labor markets are network representations that encode the empirical tendency of labor‐market entities—occupations, skills, firms, or products—to appear jointly or flow together in economic units, transitions, or postings. Such graphs provide a quantitative framework to study the structure of opportunity and information flows, predict diversification paths, and support workforce development analysis.
1. Formal Construction of Labor-Market Co-Occurrence Graphs
A co‐occurrence graph is typically defined as a weighted, undirected (or directed) network , in which the nodes represent labor‐market activities or attributes—such as occupations, firms, skills, or products. Edges record empirical co‐occurrence or transition linkage, and the edge‐weights quantify the statistical strength or frequency of joint appearance.
Key Construction Schemes
| Graph Type | Node Identity | Edge Construction |
|---|---|---|
| City–job / Country–product | Occupation / Export product | Above-chance co-presence |
| Skill co-occurrence | Discrete skill tags | Jobs listing both skills |
| Firm labor-flow | Firms (employers) | Persistent job transitions |
| Occupation similarity | ISCO/ESCO codes | Shared transition destinations |
The empirical edge strength is specified by a proximity, co-occurrence, or similarity measure. For city–job and country–product graphs, Almaatouq defines proximity , using “prominence” indicators derived from population-share or RCA thresholds (Almaatouq, 2016). Skill graphs are constructed from advert-level co-occurrence counts , normalized via Jaccard index, PMI, or cosine similarity (Liu et al., 2024). Occupation transition graphs use bipartite projections and a variety of similarity measures (cosine, Jaccard, Adamic–Adar, conditional probability) (Boškoski et al., 2022).
2. Data Sources, Preprocessing, and Operationalization
Construction of co‐occurrence graphs requires curated high‐resolution datasets, standardized entity codes, and event-level granularity.
City–job graphs employ U.S. Bureau of Labor Statistics MSA–SOC tables. For each city and occupation , is the share of workers in ; a binary prominence is set if . Aggregate job–job proximity is computed across all cities (Almaatouq, 2016). Country–product graphs utilize UN COMTRADE data; Balassa’s RCA formula operationalizes (Almaatouq, 2016). Skill graphs process millions of job adverts, de-duplicate text, map raw skill strings to canonical taxonomies (Lightcast), and build by counting advert-level skill pairings. Skill standardization removes ambiguous tokens (Liu et al., 2024). Firm labor-flow graphs build from direct job transitions recorded in employment histories (Mexico, Finland), filter to persistent edges, and aggregate at firm-level (López et al., 2015). Occupation similarity models transitions as a bipartite graph , where origins and destinations are standardized codes. Multiple projection formulas yield alternative similarity graphs (Boškoski et al., 2022).
Normalization, sparse matrix storage, self-loop treatment, rare entity filtering, and thresholding are standard preprocessing steps.
3. Mathematical Formulation and Network Metrics
Co-occurrence graphs are equipped with a variety of metrics to analyze topology, centrality, modularity, and opportunity flows.
- Edge weight definitions:
- City–job/country–product: as above; measures empirical opportunity/information transfer (Almaatouq, 2016).
- Skill graphs: Jaccard , PMI, cosine similarity (Liu et al., 2024).
- Occupation similarity: symmetric (cosine, generalized Jaccard), asymmetric (conditional probability), neighborhood-based (Adamic–Adar), and collaborative-filtering measures (Boškoski et al., 2022).
- Complexity indices:
- Method of Reflection: Iteratively defined complexity scores for units () and activities (), converging to ECI/CCI/PCI/JCI. In the matrix view, these are (rescaled) second eigenvectors of the normalized activity–unit bipartite matrix (Almaatouq, 2016).
- Centrality measures:
- Degree, eigenvector centrality, closeness (), betweenness () (Liu et al., 2024).
- Community detection:
- Louvain/Leiden modularity maximization, Markov Stability, spectral clustering (Almaatouq, 2016, Liu et al., 2024, Boškoski et al., 2022).
- Diffusion and percolation:
- SI and edge-percolation simulations examine local/global diffusion rates, with reciprocal edges supporting rapid intra-community spread and unilateral edges bridging modules (“weak-tie” effect) (Almaatouq, 2016).
4. Empirical Results and Applications
Empirical deployment of co-occurrence graphs demonstrates their utility in mapping labor-market structure and forecasting outcomes.
- Complexity–performance relationships:
- City Complexity Index (CCI) predicts city GDP with . Country ECI strongly predicts GDP per capita, with higher initial ECI forecasting faster GDP growth across 5–20 year horizons (Almaatouq, 2016).
- Skill clusters and dynamics:
- In UK job adverts, multiscale community detection identifies robust skill clusters (4 to 215). Core clusters evidence high closeness, specialized clusters exhibit low containment. Semantic coherence varies widely with cluster size and content. Cross-cluster requirements and average skills per advert increased from 2016 to 2022, reflecting demand for broader skill sets (Liu et al., 2024).
- Labor flow and firm-level analysis:
- Persistent co-occurrence structures in firm labor-flow graphs capture nearly all real job-to-job transitions, and yield steady-state employment/unemployment predictions congruent with observed statistics. Degree and edge-weight distributions are heavy-tailed, with a small backbone channeling dominant flows (López et al., 2015).
- Occupation similarity tools:
- Multiple explainable similarity measures result in differentiated career path maps; Jaccard highlights lateral mobility, cosine emphasizes volume, Adamic–Adar identifies niche pathways (Boškoski et al., 2022). Empirical evaluation on half a million transitions (Slovenia) confirms that classification accuracy for rare vs. common transitions reaches ROC area $0.65–0.70$.
Table: Key Empirical Features
| Graph Type | Principal Metric | Application Domain |
|---|---|---|
| City–job | , CCI | GDP prediction, urban planning |
| Country–product | , ECI | Trade, diversification |
| Skill co-occur. | Jaccard, closeness | Training, curriculum design |
| Firm flow | , centrality | Unemployment, shocks |
| Occupation sim. | Cosine/Jacc./Adamic–Adar | Career guidance, retraining |
5. Theoretical Implications and Policy Recommendations
Co-occurrence graphs offer a parsimonious method to infer latent capabilities and opportunity structures, overcoming aggregation biases of canonical growth models (Almaatouq, 2016).
Policy implications include:
- Targeted investment: Prioritize sectors and activities “close” in co-occurrence space to existing capabilities—maximizing spillovers and minimizing retraining costs.
- Job training: Match unemployed individuals to adjacent occupations with high co-occurrence, shortening unemployment duration.
- Skills planning: Use advert-level skill clusters to update regional skills plans, curriculum design, and anticipate future labor demand (Liu et al., 2024).
- Shock analysis and interventions: Simulate network modifications (subsidizing links, boosting hiring rates) to anticipate effects on unemployment, firm employment, and occupational resilience (López et al., 2015).
- Career transitions: Provide job-seekers with explainable recommendations based on multiple occupation similarity graphs (Boškoski et al., 2022).
6. Methodological Variants and Open Issues
The notion of a “single” occupation similarity metric is limiting; research establishes a family of plausible and explainable measures—each suited to distinct analytical purposes (Boškoski et al., 2022). Choice of measure, data preprocessing, and graph filtering can substantially affect inferred career pathways or diversification predictions.
A plausible implication is that co-occurrence-based analysis should be supplemented by careful metric selection, parameter sensitivity, and context-aware community detection. Variations in skill cluster semantic coherence and multiscale modular structure reveal that co-occurrence does not always coincide with expert taxonomies or intrinsic thematic consistency (Liu et al., 2024). Edge persistence filtering is critical to reduce noise in labor-flow graphs (López et al., 2015).
7. Future Research Directions
Open questions include integrating wage/demand overlays into co-occurrence graphs for tightness prediction, refining multidimensional clustering methods, and modeling real-time drift in labor-flow networks. Systematic comparison of explainable occupation similarity measures remains an active area, with empirical validation frameworks expanding to new geographies and labor-market systems. Quantifying network modularity evolution and its impact on retraining policy constitutes a significant research agenda.
Additional research may further connect co-occurrence graphs to agent-based modeling, economic complexity theory, and adaptive policy simulation frameworks, as suggested by recent computational social science perspectives (Almaatouq, 2016).