AcademiaOS: Automating Grounded Theory Development in Qualitative Research with Large Language Models
Abstract: AcademiaOS is a first attempt to automate grounded theory development in qualitative research with LLMs. Using recent LLMs' language understanding, generation, and reasoning capabilities, AcademiaOS codes curated qualitative raw data such as interview transcripts and develops themes and dimensions to further develop a grounded theoretical model, affording novel insights. A user study (n=19) suggests that the system finds acceptance in the academic community and exhibits the potential to augment humans in qualitative research. AcademiaOS has been made open-source for others to build upon and adapt to their use cases.
- ATLAS.ti (2023). Ai coding powered by openai. https://atlasti.com/ai-coding-powered-by-openai.
- Data collection in qualitative research.
- Research commentary—data-driven computationally intensive theory development. Information systems research, 30(1):50–64.
- Autonomous chemical research with large language models. Nature, 624(7992):570–578.
- Augmenting human innovation teams with artificial intelligence: Exploring transformer-based language models. Journal of Product Innovation Management, 40(2):139–153.
- Bowen, G. A. (2008). Naturalistic inquiry and the saturation concept: a research note. Qualitative research, 8(1):137–152.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
- The pursuit of quality in grounded theory. Qualitative research in psychology, 18(3):305–327.
- Beyond factuality: A comprehensive evaluation of large language models as knowledge generators. arXiv preprint arXiv:2310.07289.
- Grounded theory research: A design framework for novice researchers. SAGE open medicine, 7:2050312118822927.
- A survey on in-context learning. arXiv preprint arXiv:2301.00234.
- Eisenhardt, K. M. (1989). Building theories from case study research. Academy of management review, 14(4):532–550.
- Participant observation and fieldnotes. Handbook of ethnography, pages 352–368.
- European Commission, S. et al. (2019). Ethics guidelines for trustworthy ai. Publications Office.
- Ferrara, E. (2023). Should chatgpt be biased? challenges and risks of bias in large language models. arXiv preprint arXiv:2304.03738.
- Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and machines, 30(3):411–437.
- A curated debate.
- Seeking qualitative rigor in inductive research: Notes on the gioia methodology. Organizational research methods, 16(1):15–31.
- Heaven, W. D. (2022). Why meta’s latest large language model survived only three days online. MIT Technology Review. Last accessed December, 15:2022.
- Supporting serendipity: Opportunities and challenges for human-ai collaboration in qualitative analysis. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1):1–23.
- The poor and embarrassing cousin to the gentrified quantitative academics: What determines the sample size in qualitative interview-based organization studies? In Forum Qualitative Sozialforschung/Forum: Qualitative Social Research, volume 20, page 24. DEU.
- Kitzinger, J. (1995). Qualitative research: introducing focus groups. Bmj, 311(7000):299–302.
- Chatgpt: Jack of all trades, master of none. Information Fusion, 99:101861.
- Developing and testing an automated qualitative assistant (aqua) to support qualitative analysis. Family medicine and community health, 9(Suppl 1).
- Levers, M.-J. D. (2013). Philosophical paradigms, grounded theory, and perspectives on emergence. Sage Open, 3(4):2158244013517243.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
- Semi-automated coding for qualitative research: A user-centered inquiry and initial prototypes. In Proceedings of the 2018 CHI conference on human factors in computing systems, pages 1–12.
- O’neil, C. (2017). Weapons of math destruction: How big data increases inequality and threatens democracy. Crown.
- Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744.
- Convenience samples of college students and research reproducibility. Journal of Business Research, 67(5):1035–1041.
- Qualitative research in health care: Analysing qualitative data. BMJ: British Medical Journal, 320(7227):114.
- Rawls, J. (2017). A theory of justice. In Applied ethics, pages 21–29. Routledge.
- Cody: An ai-based system to semi-automate coding for qualitative research. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–14.
- How much knowledge can you pack into the parameters of a language model? arXiv preprint arXiv:2002.08910.
- Trustworthy artificial intelligence. Electronic Markets, 31:447–464.
- Thornberg, R. (2012). Informed grounded theory. Scandinavian journal of educational research, 56(3):243–259.
- Wacker, J. G. (1998). A definition of theory: research guidelines for different theory-building research methods in operations management. Journal of operations management, 16(4):361–385.
- Analyzing interview data: The development and evolution of a coding system. Qualitative sociology, 24:381–400.
- Towards a taxonomy of large language model based business model transformations. arXiv preprint arXiv:2311.05288.
- Assessing the potential of gpt-4 to perpetuate racial and gender biases in health care: a model evaluation study. The Lancet Digital Health, 6(1):e12–e22.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.