- The paper reveals that LLMs excel in structured, rule-based skills but falter in unstructured, communication-intensive domains.
- It introduces the Skill Automation Feasibility Index (SAFI) from benchmarking tasks across 1,016 occupations, highlighting nuanced AI exposure correlations.
- The study’s AI Impact Matrix informs targeted retraining by distinguishing between skill displacement risks and augmentation opportunities.
Empirical Mapping of Skill Automation Feasibility in the LLM Era
Introduction
"The AI Skills Shift: Mapping Skill Obsolescence, Emergence, and Transition Pathways in the LLM Era" (2604.06906) delivers an empirical, skill-level analysis of labor displacement and augmentation potential induced by LLMs across the global workforce. The centerpiece is the Skill Automation Feasibility Index (SAFI), computed from rigorous LLM performance benchmarking over U.S. Department of Labor O*NET's 35 skills spanning 1,016 occupations. The study cross-correlates SAFI with real-world adoption data from Anthropic’s Economic Index, proceeding beyond prior theoretical exposure indices or survey-based forecasts. The result is a nuanced AI Impact Matrix that informs direct workforce transition strategies.
Skill Importance and Categorization
O*NET’s taxonomy partitions occupational skills into Process, Content, Technical, Social, Complex Problem Solving, Systems, and Resource Management categories. The analysis establishes that Process and Content skills are rated as universally important across occupations, while Technical skills are highly specialized and less essential in most roles.
Figure 1: Process and Content skills dominate importance across occupations; Technical skills are highly specialized.
AI Exposure Landscape
Anthropic Economic Index data reveals a sharply skewed distribution of AI exposure: most occupations have negligible exposure, but a minority—computing, data, and customer-facing roles—exceed a 0.30 exposure threshold. The 25 most AI-exposed occupations center around software development, customer service, and data entry.
Figure 2: AI exposure is highly right-skewed; select occupations exhibit high rates of adoption.
Figure 3: Computing, data, and customer-facing roles are most AI-exposed.
LLM Benchmarks and SAFI Construction
The study administers 263 text-based tasks mapped to O*NET skills at three difficulty levels to four LLMs: LLaMA 3.3 70B, Mistral Large, Qwen 2.5 72B, and Gemini 2.5 Flash. The task battery is comprehensive, designed to elicit cognitive and communicative aspects of each skill. Scoring is performed heuristically across response completeness, depth, reasoning quality, and difficulty bonuses.
SAFI quantifies mean normalized performance across all tasks and models per skill; crucially, it confines measurement to the text-amenable dimension of each skill and does not imply holistic occupational automation feasibility.
SAFI Results and Model Profiles
Mathematics (73.2) and Programming (71.8) obtain the highest SAFI scores, manifesting a notable gap to other skills. Active Listening, Reading Comprehension, Speaking, and Social Perceptiveness—all in Content or Social categories—score lowest, despite their universal labor market importance.
Figure 4: Mathematics and Programming top SAFI scores; Content and Social skills (Active Listening, Reading Comprehension, etc.) are benchmarked lowest.
The complete SAFI heatmap showcases color gradations, corroborating skill-dependent rather than model-dependent automation feasibility, evidenced by a narrow 3.6-point spread across model performances.
Figure 5: SAFI heatmap across all skills and models; color gradient from low (green) to high (red) automation feasibility.
Figure 6: Technical skills register consistently high SAFI across models; Content skills remain lowest.
Benchmarking across task difficulty levels reveals higher SAFI scores for harder tasks—a known scoring artifact of lengthy, structured outputs rather than genuine improvement on complex reasoning.
Figure 7: Models score higher as tasks escalate in difficulty, with Mistral Large leading.
Skill-Exposure Correlations and Capability-Demand Inversion
Intrinsic correlations between skill importance and AI exposure exhibit that Programming (+0.455) and Content skills (Reading, Writing, Listening) are concentrated in AI-exposed roles, whereas physical technical skills cluster in low-exposure occupations.
Figure 8: Programming is maximally concentrated in AI-exposed occupations; physical technical skills are least affected.
Critically, however, plotting SAFI scores against real-world exposure correlations uncovers a negative trend (r=−0.196), labeled the "capability-demand inversion": skills most integral in AI-exposed jobs are those on which current LLMs perform least well.
Figure 9: Negative correlation between SAFI and AI exposure; skills important in exposed jobs have lower feasibility scores.
Automation vs. Augmentation Patterns
Analysis of 3,364 task interactions confirms that augmentation (collaborative AI use) dominates: 78.7% of observed AI task interactions are augmentation, with only 21.3% representing directive automation.
Figure 10: Nearly four in five AI interactions are augmentation, not outright replacement.
Skill Category Profiles Across Models
Radar visualizations profile each model’s SAFI across skill categories. Mistral Large achieves the most balanced capability profile; all models, regardless of architecture or origin, exhibit convergence in overall shapes, reinforcing findings that skill dependency supersedes model dependency at current frontier scales.
Figure 11: Balanced performance across skill categories for Mistral Large; shape convergence for all four models.
The AI Impact Matrix
The authors synthesize SAFI scores and AI exposure correlations into a matrix with four quadrants—
Discussion and Implications
Capability-Demand Inversion
Empirical evidence demonstrates that LLMs are strongest on structured, rule-based skills and weakest on unstructured, communication-intensive skills. Consequently, AI is currently deployed as an augmentation tool in roles demanding nuanced human interaction—contrary to expectations that these roles should be the earliest displaced.
Skill Quadrant-Driven Workforce Response
The AI Impact Matrix enables precise workforce transition planning: retraining, upskilling, and redeployment efforts can be tailored by quadrant. Corporations and educational institutions are urged to focus on AI collaboration meta-skills and system-level thinking. Policy interventions should differentiate skill transition pathways, not treat AI exposure as monolithic.
Model Convergence
Narrow SAFI variance across models suggests that skill-level assessments are robust to fluctuations in model architecture and origin—with implications that workforce planners need not track individual model releases at current scale.
Structural Limitations
The authors transparently acknowledge limitations: SAFI scores are confined to text-based domains; physical, real-time, and interpersonal skill dimensions are outside scope. Scoring artifacts, sample size constraints, and the reliance on Anthropic-specific data are noted. The AI Impact Matrix is heuristic, not predictive.
Future Directions
The paper outlines the need for longitudinal SAFI tracking as new foundation models proliferate, mapping skill adjacency for retraining programs, benchmarking emergent AI-era skills (e.g., prompt engineering), and integrating BLS employment data to project affected workforce scale.
Conclusion
The analysis in "The AI Skills Shift" (2604.06906) advances empirical, skill-level measurement of LLM impact on labor markets, eliminating speculation in favor of measurable capability mapping. The revealed "capability-demand inversion" underscores that AI is, as of now, augmenting rather than automating the core skills of the most AI-exposed occupations. This represents a critical shift in understanding the near-term impact on workforce structure and the necessary urgency in targeted retraining and upskilling interventions. The public release of all benchmarking data and code establishes a foundation for ongoing empirical tracking as the automation frontier advances.