Enhancing Higher Education with Generative AI: A Multimodal Approach for Personalised Learning

Published 11 Feb 2025 in cs.HC and cs.AI | (2502.07401v1)

Abstract: This research explores the opportunities of Generative AI (GenAI) in the realm of higher education through the design and development of a multimodal chatbot for an undergraduate course. Leveraging the ChatGPT API for nuanced text-based interactions and Google Bard for advanced image analysis and diagram-to-code conversions, we showcase the potential of GenAI in addressing a broad spectrum of educational queries. Additionally, the chatbot presents a file-based analyser designed for educators, offering deep insights into student feedback via sentiment and emotion analysis, and summarising course evaluations with key metrics. These combinations highlight the crucial role of multimodal conversational AI in enhancing teaching and learning processes, promising significant advancements in educational adaptability, engagement, and feedback analysis. By demonstrating a practical web application, this research underlines the imperative for integrating GenAI technologies to foster more dynamic and responsive educational environments, ultimately contributing to improved educational outcomes and pedagogical strategies.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a multimodal GenAI chatbot that integrates text, image, and file inputs to create personalized learning experiences.
The paper employs state-of-the-art APIs like ChatGPT and Google Bard to convert diagrams to code and analyze coursework documents.
The paper demonstrates scalable educational insights and adaptive feedback through fine-tuned NLP techniques and robust analytical metrics.

Enhancing Higher Education with Generative AI: A Multimodal Approach for Personalised Learning

This paper presents a focused exploration of the application of Generative AI (GenAI) within higher education through the development of a multimodal chatbot. The central aim of this research is to enrich personalized learning experiences by leveraging the combination of text, image, and file inputs. The multimodal chatbot is specifically designed to address a wide spectrum of educational queries, thus helping to bridge existing gaps in conventional educational technologies.

The authors implemented various state-of-the-art GenAI technologies, utilizing the capabilities of the ChatGPT API for text-based interactions and Google Bard for image analysis and diagram-to-code conversions. The integration of multimodal input capabilities represents the primary contribution of this research, enabling the chatbot to effectively process and respond to complex educational queries. This innovation is particularly relevant in disciplines that require significant interaction with visual information, such as STEM fields.

The paper further introduces a file-based analyzer to enhance the teaching process. This component of the system supports uploading coursework-related documents and provides nuanced sentiment and emotion analysis. Such functionality equips educators with the ability to gain comprehensive insights into student feedback and course evaluations. Key metrics such as sentiment scores and keyword summaries offer educators a substantive tool for pedagogical assessment and improvement.

The methodology outlined in the paper involves the meticulous design of three primary modules: text-based, image-based, and file-based components. The text-based module leverages fine-tuning principles to adapt the ChatGPT API for specific educational contexts. The image-based module employs Google Bard's robust capabilities in interpreting and converting diagrammatic content into executable code, a notable advancement given the existing challenges inherent in such conversions. The file-based analyzer module provides powerful analytical capabilities, drawing on NLP methodologies and Plutchik's emotion wheel to generate detailed analyses of feedback data.

In demonstrating the proof-of-concept using Gradio, an open-source library, the authors effectively showcase each module's functionality. This demonstration underscores the feasibility of employing an integrated GenAI system to address complex educational requirements within a user-friendly and scalable web application framework.

The implications of this research are substantial, heralding a shift toward more dynamic and responsive educational environments facilitated by GenAI technologies. The combination of multimodal input capabilities and scalable, granular analysis marks significant progress toward more personalized, adaptable, and efficient educational processes. For future research, the integration of additional modalities such as voice and haptics could be explored to further augment interactive capabilities.

In conclusion, while the exploration of multimodal conversational AI in education remains in its formative stages, the developments presented in this paper indicate a promising trajectory. This research provides a foundational study into the practical applications of GenAI within educational settings, with the potential for significant contributions to both theoretical advancements and practical applications in adaptive learning technologies.