LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning

Published 4 May 2025 in cs.CL and cs.AI | (2505.02078v1)

Abstract: Evaluating the quality of slide-based multimedia instruction is challenging. Existing methods like manual assessment, reference-based metrics, and LLM evaluators face limitations in scalability, context capture, or bias. In this paper, we introduce LecEval, an automated metric grounded in Mayer's Cognitive Theory of Multimedia Learning, to evaluate multimodal knowledge acquisition in slide-based learning. LecEval assesses effectiveness using four rubrics: Content Relevance (CR), Expressive Clarity (EC), Logical Structure (LS), and Audience Engagement (AE). We curate a large-scale dataset of over 2,000 slides from more than 50 online course videos, annotated with fine-grained human ratings across these rubrics. A model trained on this dataset demonstrates superior accuracy and adaptability compared to existing metrics, bridging the gap between automated and human assessments. We release our dataset and toolkits at https://github.com/JoylimJY/LecEval.

Abstract PDF Upgrade to Chat

Authors (11)

Summary

Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning

The paper "An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning" presents a sophisticated approach to evaluating the effectiveness of multimedia learning through slide-based presentations. The authors highlight the challenges of current evaluation methods such as manual assessment, reference-based metrics, and LLM evaluators, noting issues with scalability, context capture, and bias. To address these shortcomings, they propose a novel automated metric inspired by Mayer's Cognitive Theory of Multimedia Learning. This metric is designed to assess multimodal knowledge acquisition by utilizing four rubrics: Content Relevance (CR), Expressive Clarity (EC), Logical Structure (LS), and Audience Engagement (AE).

The study involves the creation of a comprehensive dataset composed of over 2,000 slides from more than 50 online course videos, with each slide annotated according to the aforementioned rubrics by human evaluators. The dataset facilitates the training of a model that significantly surpasses existing metrics in terms of accuracy and adaptability, effectively bridging the gap between automated and human assessments.

Key Contributions

Introduction of a Novel Metric: The paper introduces a metric that offers an innovative approach to evaluating slide-based learning by focusing on multimodal knowledge acquisition, rooted in Mayer's principles of effective multimedia learning.
Dataset Creation: A large-scale dataset is curated, consisting of 56 academic lecture videos, leading to 2,097 annotated slide-text pairs. This dataset is publicly released, providing a valuable resource for further research in multimedia learning environments.
Model Training and Evaluation: The authors train a model using their dataset, which demonstrates superior performance against baseline evaluation methods. The model shows high correlation with human evaluative capacities, validating the theoretical underpinnings of the metric provided.
Implications for Automated Evaluations: The research emphasizes the potential for this metric to offer robust, scalable solutions in evaluating multimodal educational content, thus advancing both theoretical and practical aspects of automated learning assessments.

Research Outcomes

The conducted experiments reveal that the proposed metric substantially outclasses traditional metrics such as BLEU, ROUGE, and METEOR, as well as LLM evaluators like GPT-4V, in performing effective and nuanced evaluations. The integration of multimodal data captures greater contextual understanding of educational material than conventional methods.

Future Directions

The paper opens avenues for more research into automated educational evaluation metrics, particularly in adapting them to different educational contexts and modalities. By releasing the dataset and toolkits, the authors encourage further developments that can refine and enhance computational models to align even more closely with human judgment.

The work presents a compelling case for more comprehensive, context-aware automated metrics in multimedia learning environments, marking significant strides toward improving the quality and effectiveness of educational assessments using AI. This research sets a foundation for future inquiry into how automated metrics can better aid educators and learners by providing insightful evaluations that are consistent, scalable, and reliable across various learning platforms.

Markdown Report Issue