An Investigation of the Test-Retest Reliability of the miniPXI

Published 28 Jul 2024 in cs.HC | (2407.19516v1)

Abstract: Repeated measurements of player experience are crucial in games user research, assessing how different designs evolve over time. However, this necessitates lightweight measurement instruments that are fit for the purpose. In this study, we conduct an examination of the test-retest reliability of the \emph{miniPXI} -- a short variant of the \emph{Player Experience Inventory} (\emph{PXI}), an established measure for measuring player experience. We analyzed test-retest reliability by leveraging four games involving 100 participants, comparing it with four established multi-item measures and single-item indicators such as the Net Promoter Score (\emph{NPS}) and overall enjoyment. The findings show mixed outcomes; the \emph{miniPXI} demonstrated varying levels of test-retest reliability. Some constructs showed good to moderate reliability, while others were less consistent. On the other hand, multi-item measures exhibited moderate to good test-retest reliability, demonstrating their effectiveness in measuring player experiences over time. Additionally, the employed single-item indicators (\emph{NPS} and overall enjoyment) demonstrated good reliability. The results of our study highlight the complexity of player experience evaluations over time, utilizing single and multiple items per construct measures. We conclude that single-item measures may not be appropriate for long-term investigations of more complex PX dimensions and provide practical considerations for the applicability of such measures in repeated measurements.

Abstract PDF HTML Upgrade to Chat

Summary

The paper demonstrates that the miniPXI shows variable test-retest reliability, with ICC values ranging from 0.365 to 0.704 across different player experience constructs.
The study uses a rigorous method by assessing 100 participants over two sessions with a three-week interval and comparing miniPXI with measures like PXI and GEQ.
Implications highlight that while miniPXI offers efficiency, single-item measures may be less reliable for complex experiences, suggesting the need for complementary multi-item scales.

Evaluation of the Test-Retest Reliability of the miniPXI for Player Experience Assessment

The research paper presents a comprehensive study on the test-retest reliability of the mini Player Experience Inventory (miniPXI), a streamlined measure for assessing Player Experience (PX). As PX continues to be crucial in games user research (GUR), the need for expedient and reliable measurement tools becomes paramount, especially in iterative game development cycles. This paper scrutinizes the miniPXI's consistency across repeated measures, offering insights into its applicability and reliability.

Overview and Methodology

The study evaluates the miniPXI, a condensed version of the Player Experience Inventory (PXI), which reduces the original measure to one item per construct, totaling 11 items. The authors assess the miniPXI's test-retest reliability over three weeks and compare it with several established multi-item measures, including the PXI, Player Experience of Need Satisfaction (PENS), Game Engagement Questionnaire (GEQ), and AttrakDiff. Utilizing a participant pool of 100 individuals, the study involves completing assessments after playing one of four different games across two sessions with a three-week interval. This approach enables an evaluation of the miniPXI's reliability across diverse gaming contexts and comparison with other PX measurement tools.

Key Findings

Overall Test-Retest Reliability: The miniPXI demonstrated varied test-retest reliability, with Intraclass Correlation Coefficient (ICC) values ranging from 0.365 to 0.704 for different constructs. Constructs such as Enjoyment, Clarity of Goals, and Progress Feedback exhibited moderate reliability. In contrast, constructs like Immersion and Challenge showed lower reliability, raising questions about the instrument's consistency across more complex PX dimensions.
Comparison with Multi-Item Measures: Multi-item measures, including the PXI and GEQ, generally exhibited moderate to good test-retest reliability, with ICCs typically surpassing those of the miniPXI. Notably, the GEQ, which includes dimensions like Flow and Presence, showed robust reliability metrics, supporting the effectiveness of multi-item scales in capturing nuanced player experiences over time.
NPS and 'Appreciation' Item: The Net Promoter Score (NPS) and a general 'appreciation' item both exhibited good test-retest reliability, suggesting their potential as reliable single-item proxies for overall satisfaction and recommendation likelihood. However, it is important to note that these items capture broader satisfaction metrics rather than specific aspects of the player experience.

Implications and Future Directions

The study highlights the limitations and contextual applicability of single-item measures like the miniPXI. While offering advantages in terms of brevity and ease of administration, their reliability is inconsistent, particularly for intricate experiences such as immersion. This finding suggests caution in using single-item measures for longitudinal studies or scenarios requiring high reliability.

From a practical standpoint, games user researchers should critically evaluate the use of single versus multi-item measures based on the game's genre and the specific research questions. For comprehensive assessments of PX, especially in iterative development, the integration of multi-item measures or additional items for key dimensions may be warranted.

Furthermore, the study underscores the potential for single-item metrics like the NPS to serve as supplemental tools for assessing general satisfaction and recommendation tendencies. However, the specific and dynamic nature of PX necessitates continued exploration and possibly alternative formulations that better encapsulate the complex constructs involved.

Conclusion

In summary, while the miniPXI offers a practical and efficient means to gauge PX, its test-retest reliability varies significantly across constructs and contexts. As GUR evolves, the study underlines the importance of balancing brevity with reliability, recommending that researchers carefully assess the choice of measurement tools based on the scope and objectives of their evaluative endeavors. Future research should further investigate the dynamics of PX and explore enhancements to single-item measures that address current limitations.

Markdown Report Issue