Working with AI: Measuring the Occupational Implications of Generative AI

Published 10 Jul 2025 in cs.AI, cs.CY, econ.GN, and q-fin.EC | (2507.07935v3)

Abstract: Given the rapid adoption of generative AI and its potential to impact a wide range of tasks, understanding the effects of AI on the economy is one of society's most important questions. In this work, we take a step toward that goal by analyzing the work activities people do with AI, how successfully and broadly those activities are done, and combine that with data on what occupations do those activities. We analyze a dataset of 200k anonymized and privacy-scrubbed conversations between users and Microsoft Bing Copilot, a publicly available generative AI system. We find the most common work activities people seek AI assistance for involve gathering information and writing, while the most common activities that AI itself is performing are providing information and assistance, writing, teaching, and advising. Combining these activity classifications with measurements of task success and scope of impact, we compute an AI applicability score for each occupation. We find the highest AI applicability scores for knowledge work occupation groups such as computer and mathematical, and office and administrative support, as well as occupations such as sales whose work activities involve providing and communicating information. Additionally, we characterize the types of work activities performed most successfully, how wage and education correlate with AI applicability, and how real-world usage compares to predictions of occupational AI impact.

Abstract PDF Upgrade to Chat

Summary

The paper introduces an empirical methodology that computes AI applicability scores by analyzing over 200k Bing Copilot conversations mapped to O*NET work activities.
It uses a GPT-4o classification pipeline to differentiate between user goals and AI actions, revealing that tasks like information gathering and writing are highly augmented.
The study finds a strong correlation (r = 0.73) between AI scores and occupational impact, emphasizing AI’s role in augmenting knowledge and communication work.

Here's an essay that summarizes the paper.

Occupational Impact Measurements of Generative AI

This paper (2507.07935) presents an empirical analysis of how generative AI is used in real-world scenarios and its potential impact on various occupations. By examining anonymized conversations between users and Microsoft Bing Copilot, the study identifies work activities that people seek AI assistance for, as well as those that AI performs autonomously. The authors classify these activities using the O*NET database and measure task success and scope of impact to compute an AI applicability score for each occupation.

Methodology and Data Analysis

The study utilizes two datasets: Copilot-Uniform, a representative sample of 100k conversations, and Copilot-Thumbs, a dataset of 100k conversations with user feedback (thumbs up/down). A key distinction is made between user goals (tasks users seek assistance with) and AI actions (tasks performed by the AI). Conversations are classified into O*NET's intermediate work activities (IWAs) using a GPT-4o-based LLM classification pipeline. The AI applicability score for each occupation is calculated based on the activity share, task completion rate, and scope of AI impact. The formula to compute AI applicability score $a_i^{\text{user}}$ for occupation $i$ is:

$a_i^{\text{user} = \sum_{j \in \text{IWAs}(i)} w_{ij} \mathbf{1}[f_j^\text{user} \ge 0.0005] c_j^\text{user} s_j^\text{user}$,

where IWAs $(i)$ is the set of IWAs performed by occupation $i$ , $w_{ij}\in [0, 1]$ is the importance- and relevance-weighted fraction of work in $i$ composed of IWA $j$ , $f_j^\text{user}\in[0, 1]$ is the user goal activity share of $j$ , $c_j^\text{user}$ is the task completion rate of conversations with IWA $j$ as a user goal, and $s_j$ is the fraction of conversations with user goal $j$ in which the scope classification is moderate or higher.

Figure 1: This figure shows the frequency of O*NET Generalized Work Activities (GWAs) in Copilot usage.

Key Findings on Work Activities

The analysis reveals that information gathering, writing, and communicating are the most common user goals. AI actions frequently involve providing information, assistance, training, and advising. Interestingly, 40% of conversations exhibit disjoint sets of user goals and AI actions, indicating that AI often plays a supporting role rather than directly replicating user tasks. There is a strong correlation between positive user feedback and task completion rates, with writing and researching activities receiving the most positive feedback, while data analysis and visual design receive the worst.

Figure 2: This figure shows the frequency of top IWAs.

Occupational Impact and Socioeconomic Correlates

The study identifies knowledge work and communication-focused occupations as having the highest AI applicability scores. Specifically, Sales; Computer and Mathematical; Office and Administrative Support; Community and Social Service; Arts, Design, Entertainment, Sports, and Media; Business and Financial Operations; and Educational Instruction and Library occupations show the greatest potential for AI impact. Conversely, occupations involving physical labor, machinery operation, and manual tasks have the lowest scores. The paper finds a correlation of $r = 0.73$ between the occupation-level impact predictions and the AI applicability score when compared to predictions of AI labor impact. A weak correlation is found between AI applicability scores and educational requirements, with occupations requiring a Bachelor's degree slightly more affected. Similarly, only a slightly higher average AI applicability is observed for high-wage occupations.

Figure 3: This figure compares the AI applicability score to the human-rated E1 exposure from \citet{eloundou2024gpts}.

Discussion and Implications

The paper's findings suggest that generative AI is primarily augmenting knowledge work and communication-related tasks. The authors emphasize that their data reflect AI usage patterns and do not directly measure downstream business impacts, such as job displacement or wage changes. They caution against assuming that occupations with high AI action overlap will necessarily face automation-induced job losses, as new technologies can also create new job roles and shift work responsibilities.

One notable aspect of the analysis is the separation of work activities into AI actions versus user goals. By separately measuring the tasks that AI performs and assists, the study provides a nuanced view of the automation versus augmentation debate. The paper acknowledges several limitations, including the focus on a single AI platform (Bing Copilot), the challenges of determining work context in conversations, and the reliance on O*NET data, which may lag behind real-world workplace activities.

Conclusion

This paper (2507.07935) provides valuable empirical insights into the occupational implications of generative AI by analyzing real-world usage data and classifying work activities. The study's methodology and findings contribute to the ongoing discussion about the future of work in the age of AI, highlighting the importance of considering both AI's capabilities and its potential to augment human tasks. Future research should focus on tracking how occupations adapt to AI, measuring the emergence of new job roles, and understanding the evolving landscape of AI capabilities and their overlap with various occupations.