- The paper introduces a fusion approach that combines eye tracking with user priors to enhance user modeling in AI-assisted decision making.
- It demonstrates that integrating dynamic gaze metrics with stable user traits significantly improves prediction robustness across different AI advice conditions.
- Findings highlight that adaptive user models, which consider both behavior and prior knowledge, can better balance cognitive load and decision confidence.
Condition-Aware Fusion of Eye Tracking and User Priors in AI-Assisted Decision Making
Introduction
"Eyes Can't Always Tell: Fusing Eye Tracking and User Priors for User Modeling under AI Advice Conditions" (2604.01741) primarily addresses the limitations of gaze-based inference of user cognitive states within the context of AI-assisted factual verification tasks. The authors systematically investigate how both the presence and correctness of AI advice modulate the relationship between eye-tracking signals and self-reported cognitive states—namely, cognitive load and decision confidence—as well as objective decision accuracy. Crucially, the paper demonstrates the necessity of integrating stable user priors (demographics, AI literacy, and propensity to trust technology) with dynamic interaction signals to achieve robust cross-participant generalization, especially under variable AI reliability.
Figure 1: Three-step experimental workflow collects eye-tracking and self-report data, extracts trial-level behavioral/physiological and participant priors, and evaluates prediction models conditioned on AI advice.
Empirical Study Design and Eye Tracking Analysis
The study employs a within-subject lab protocol on factual verification, manipulating AI assistance across three conditions: baseline (No-AI), Correct-AI, and Incorrect-AI advice. Each participant completes 12 trials, encountering counterbalanced sequences of true/false claims with or without AI assistance. Eye-tracking is performed at 60 Hz using AOIs for evidence/context, AI advice, and user rating panels. Self-reported cognitive load and confidence (Likert scale), together with manipulation checks, offer ground-truth for post-trial modeling. Demographic information, AI literacy/experience, and trust propensity are acquired via pre-study survey.
Figure 2: Gaze heatmaps show denser visual attention in No-AI and Incorrect-AI trials, with more distributed fixations across evidence and response panels.
Condition-Sensitive Effects on Cognitive States and Gaze Patterns
Mixed effects modeling and repeated measures ANOVA reveal strong, statistically significant modulation of cognitive states by AI advice:
- Cognitive Load: Correct-AI advice yields lowest cognitive load (mean=3.18) versus No-AI (3.56, p=0.010) and Incorrect-AI (3.38, p=0.040).
- Decision Confidence: Both AI conditions increase confidence over No-AI, with Correct-AI showing highest values (No-AI=5.22, Correct-AI=5.93, p<0.001).
- Decision Accuracy: No significant main effect, indicating cognitive states can shift independently of objective correctness.
Gaze metrics (fixations, saccades, pupil diameter, TTFF) demonstrate that the presence and correctness of AI advice directly influence visual processing:
Figure 3: Mean fixation/saccade count, pupil diameter, and TTFF across AOIs, showing significant context- and advice-dependent differences.
- No-AI: Longer fixations, more saccades, larger pupil diameter on evidence/context reflect elevated cognitive effort and uncertainty.
- Correct-AI: Reduced fixational metrics and faster orientation suggest facilitation; participants efficiently process advice.
- Incorrect-AI: Increased fixations on context, delayed rating region focus indicate compensatory verification effort under misleading advice.
Robustness of Eye-Tracking-Based Predictive Modeling
Prediction tasks are framed as trial-level classification (high vs. low cognitive load/confidence/accuracy), using leave-one-subject-out cross-validation. Various ML models (LR, SVM, RF, ET, AdaBoost, XGB, MLP) are evaluated with feature sets: gaze-only, priors-only, and multimodal fusion.
Key numerical results:
- Eye-tracking signals alone robustly predict decision accuracy (All models except LR: accuracy ∼0.79), though optimal performance is often achieved in condition-specific rather than pooled models.
- Self-reported cognitive load and confidence are less reliably decoded from gaze alone (mean accuracy ∼0.66), showing marked instability across AI conditions.
- Multimodal fusion (gaze+priors) consistently outperforms both uni-modal approaches, delivering peak performance especially where gaze-signal mappings are disrupted by misleading AI.
Feature Importance and Mechanistic Interpretation
SHAP analysis identifies the dominant predictors for each target variable under Correct-AI and Incorrect-AI:
Figure 4: SHAP attribution scores highlighting top-10 features by predictive value across AI reliability, with gaze metrics dominating Correct-AI; user priors (AI experience, demographics) rise in significance under Incorrect-AI.
- Cognitive Load: Under Correct-AI, gaze-derived features predominate; Incorrect-AI elevates user priors, notably AI experience.
- Confidence: AOI-Advice features and rating region attention are more impactful; under Incorrect-AI, user priors increase explanatory power.
- Accuracy: Context AOI features gain prominence with misleading advice, supporting the strategy-shifting hypothesis.
These findings corroborate the theoretical premise that AI reliability fundamentally alters both cognitive processes and their behavioral correlates.
Practical and Theoretical Implications
Adaptive User Modeling: The demonstrated conditional heterogeneity implies that robust inference architectures must integrate explicit AI condition features, rather than relying on pooled models or fixed gaze-to-cognition mappings. Mixture-of-experts or condition-specific heads can mitigate the degradation in prediction performance under shifting advice reliability.
Personalization and Cold-Start Robustness: Incorporating user priors addresses the cold-start problem by anchoring intra-individual variability, facilitating transfer and generalization. Practical deployment is enabled by lightweight surveys or behavioral proxies.
Human-AI Interaction Design: The results motivate cognitively-aligned AI experiences: adaptive systems should jointly sense dynamic behavioral signals (eye tracking), encode context (AI advice, correctness), and personalize responses via stable traits (literacy, trust propensity). This is consistent with recent calls for responsible and explainable AI that calibrate overreliance and support appropriate trust [explanation_ai_overreliance, AI_assisted_decision_making].
Future developments may extend to richer advice formats, variable explanation strategies, and multi-modal physiological sensing, offering further granularity in moment-to-moment adaptation.
Conclusion
This study demonstrates that gaze-based sensing for cognitive user modeling in AI-assisted decision scenarios is strongly condition-sensitive; the same eye-tracking patterns signal different cognitive states depending on AI reliability. Fusing gaze features with stable user priors substantially improves predictive robustness and cross-participant generalization. Practical implication: adaptive systems should treat AI condition and user characteristics as integral to cognitive-state inference, enabling more effective, personalized, and trustworthy AI-assisted decision making.