External Human-Machine Interfaces (eHMIs)

Updated 9 February 2026

External Human-Machine Interfaces (eHMIs) are explicit signaling systems that communicate autonomous vehicle intentions through visual, auditory, and adaptive modalities.
They utilize multiple channels—LEDs, projections, symbols, text, and audio—to bridge the communication gap and foster pedestrian trust and safety in mixed traffic environments.
Recent research highlights the importance of participatory design, adaptive architectures, and standardization in optimizing eHMI clarity, responsiveness, and inclusivity.

External Human-Machine Interfaces (eHMIs) are explicit signaling systems integrated into autonomous vehicles (AVs), delivery robots, and various forms of automated mobility platforms to communicate intent, awareness, and behavioral state to external human road users such as pedestrians, cyclists, and conventional drivers. eHMIs are essential to bridge the communication gap left by the absence of human drivers and are critical for fostering pedestrian trust, safety, and the efficient negotiation of right-of-way in both structured and shared urban environments. This article reviews the principal modalities, guiding design principles, adaptive architectures, empirical findings, and ongoing standardization efforts as documented across recent research (Cumbal et al., 23 Jun 2025, Tran et al., 17 Aug 2025, Liu et al., 5 Feb 2025, Liu et al., 2023, Sun et al., 30 Dec 2025, Marcus et al., 28 Jul 2025).

1. Core Modalities and Taxonomies

eHMIs are realized through a spectrum of communication modalities, each with distinct cognitive, perceptual, and contextual trade-offs (Cumbal et al., 23 Jun 2025):

Modality	Strengths	Weaknesses
LED Lights	Highly recognizable with standard colors	Reduced visibility in strong light/fog
Projections	Create dynamic road/floor markers, e.g., virtual crosswalks	Susceptible to surface/wetness limitations
Symbols (Icons)	Universal iconography	Potential for local convention conflict
Text	Explicit communication	Language barriers, reading latency
Speech/Audio	Direct verbal guidance	Ambient noise, accessibility variability
Human-like cues	Eye contact, anthropomorphic engagement	Uncanny valley, poor performance in dark

Empirical rankings show strong preference for LEDs, projections, and symbols, with statistical support for LEDs and symbols (mean rank ≈ 3.28; significantly preferred over speech and anthropomorphic cues: p < 0.001) (Cumbal et al., 23 Jun 2025). In specialized contexts, such as delivery robots and personal mobility vehicles, text displays and scenario-aligned pictograms dominate, with lights functioning as an auxiliary/redundant channel (Kannan et al., 2021).

2. Participatory and Empirical Methods for eHMI Design

Participatory sketching, ranking surveys, and user-in-the-loop scenario prototyping are increasingly central to eHMI design methodology:

Participatory Sketching: End-users generate design concepts for eHMIs, revealing strong preferences for multi-modal, directional, and adaptive cues while reusing familiar vehicle features (e.g., incorporating indicators into grille or windshield locations for “driver’s eyes”) (Cumbal et al., 23 Jun 2025).
Quantitative Metrics: Crossing hesitation time, message comprehension rate ( $R_{\rm correct}$ ), subjective trust and perceived safety, cognitive load (NASA-TLX), and component preference ranking are standard analytical endpoints.
Task Refinement: Finer granularity is achieved by narrowing sketching tasks to singular interface regions and using reflective, scenario-triggered prompts to elicit deeper consideration for environmental and demographic constraints (Cumbal et al., 23 Jun 2025).

User-generated and co-designed eHMI concepts exhibit convergent themes, notably the integration of multiple signaling modalities, deliberate directionality (projectors or sequential LEDs targeting specific users), and dynamic adaptation to context (brightness, volume, semantic content) (Cumbal et al., 23 Jun 2025).

3. Principles and Best Practices in eHMI Design

Across empirical studies, four consensus principles have emerged (Cumbal et al., 23 Jun 2025, Liu et al., 2023, Colley et al., 18 Jan 2025):

Multi-Modal Integration: Robust physical safety and accessibility are achieved through redundancy—using at least two channels (e.g., LEDs plus text). Over-signaling, however, incurs cognitive or sensory overload; a balance (single concise phrase and icon) is recommended.
Directionality: Signals must be targeted toward the intended recipient (e.g., curbside projections, sequential LED patterns on bumpers) instead of omnidirectional broadcasting to maximize clarity and minimize confusion, particularly in multi-user contexts.
Adaptivity and Context Responsiveness: Visual/auditory outputs should auto-adjust brightness, contrast, or volume to compensate for environmental noise and lighting, ensuring legibility and reliability.
Reuse of Familiar Vehicle Elements: Placement of symbolic cues (pedestrian icons on windshield header, use of green/red for yield/stop) leverages transfer of established road conventions to novel AV interfaces, reducing the learning curve.

Standardization proposals specify preferred color frequencies (cyan, ~3 Hz flashing), font sizes, and the maximum number of concurrent cues to minimize display clutter and overload (Colley et al., 18 Jan 2025, Marcus et al., 28 Jul 2025).

4. Adaptive and Scalable eHMI Architectures

Recent frameworks formalize adaptive eHMI design as a cyber-physical system with three interacting layers (Tran et al., 17 Aug 2025):

Input Layer: Sensor fusion captures actor state (vision, lidar, radar), environmental context (luminance, weather, noise), and sociocultural norms, formalized as a high-dimensional feature vector $u(t)$ .

Processing Layer: Adaptation policy $\pi: u(t)\mapsto(d,a^*,S)$ selects (i) whether adaptation is needed, (ii) which communication parameters to modify (modality, scope, timing), and (iii) the recipient scope (individual, group, global). Decision models leverage learned classifiers, e.g., neural networks inferring intent probability, with cost functions balancing clarity, cognitive demand, and inclusivity constraints.

Output Layer: Parameterized messaging across multiple output channels: visual (LEDs, projection, text), auditory (synthetic speech/tones), and kinematic cues (yielding via motion profile $v(t)$ , trajectory $\ell(t)$ ) bundled as $y(t)$ . Adaptation internalizes actor traits (e.g., switching to audio for the visually impaired).

Scalability is formalized via constraints on aggregate cognitive load ( $\sum_j {\rm load}(y^{(j)}) \leq L_{\max}$ ), and inclusivity is secured by embedding actor-specific weights in cost functions. Standardized metrics include crossing hesitation times ( $T_{\rm hesitate}^{(j)}$ ), comprehension rates, and both subjective (NASA-TLX, trust) and objective (TTX-to-collision) safety measures (Tran et al., 17 Aug 2025).

5. Psychological and Behavioral Effects

Causal modeling and controlled studies provide insight into the psychological pathways by which eHMIs influence user decision-making:

Timing of Signals: Early or synchronous cues (“I will stop” at or before deceleration begins) maximize situational awareness (Q1: ~4.57 vs. 3.28 no eHMI), trust (Q4: ~4.08 vs. 3.25), and reduce crossing hesitation (Q6: ~2.07 vs. 3.23; 25% faster crossing initiation) (Liu et al., 5 Feb 2025).
Calibrated Trust and Risk Perception: Strengthened awareness via explicit eHMI cues leads to well-founded trust and sharply attenuates perceived danger. Too-late cues or silent displays improve trust only modestly and do not speed up decisions.
Role-Specific Effects: eHMI benefits extend beyond pedestrians to conventional vehicle drivers and cyclists; unified designs have shown efficacy in enhancing safety perception, trust, and understandability across all major roles, supporting the case for international harmonization (Colley et al., 27 Jan 2026).

Representative structural equation models (e.g., Eq. (1) in (Liu et al., 5 Feb 2025)) describe the propagation of eHMI effects from comprehension and trust through to observable behavior (crossing initiation time).

6. Challenges in Complex, Multi-User Environments

The performance and safety contributions of eHMIs vary with environmental and traffic complexity:

Multi-Lane Interference: In unsignalized multi-lane scenarios, allocentric (vehicle-centric) and egocentric (pedestrian-centric) eHMIs yield different cognitive loads and risk profiles. Allocentric cues induce higher cognitive demand and distraction (more gaze time on non-relevant lanes), while egocentric cues risk misleading pedestrians when signals are asymmetric across lanes, possibly raising the incidence of unsafe crossings (Ye et al., 6 Jan 2026).
Interconnected eHMIs: Coordination (e.g., via V2X) across AVs enables unified multi-vehicle signaling, which improves perceived safety and encourages caution but can also introduce semantic ambiguities, especially in color-coded (red/green) systems (Tran et al., 2024). Users require systematic, scenario-driven instruction to correctly interpret interconnected eHMI encodings.
Inclusion and Accessibility: DHH (Deaf and Hard-of-Hearing) participants exhibit longer vehicle gaze and rely more on visual channels. Both visual and auditory cues enhance trust and perceived safety, but only visual eHMIs reduce crossing hesitation and gaze duration (Xu et al., 20 Jan 2026). Multi-modal redundancy is essential for universal accessibility.

7. Future Directions and Standardization

The evolution of eHMI practice is toward adaptive, context-sensitive, and inclusive multi-modal systems, with open research challenges in:

Automated Design: Pipelines integrating LLMs and action renderers to synthesize scenario-specific eHMI actions—benchmarked by human and vision-LLM (VLM) judgments—reduce manual overhead and facilitate scalable deployment (Xia et al., 27 May 2025).
Affective and Embodied Cues: Initial explorations into affective and animal-inspired eHMIs (e.g., TailCue) indicate the need for motion–emotion congruence and scenario optimization. Standalone expressive cues have limited efficacy without complementary context and multi-modality (Wang et al., 2024, Li et al., 18 Nov 2025).
Standardized Metrics and Regulatory Input: Quantitative parameter ranges (color, intensity, blink frequency; e.g., cyan, 3 Hz flash) and efficacy data directly inform emerging ISO and SAE guidelines (Colley et al., 18 Jan 2025). Cognitive load, gaze behavior, and trust are prime candidates for scaling international minimum standards.

In sum, research converges on the primacy of legible, multi-modal, context-aware, and harmonized eHMI designs, dynamically tuned to user, situational, and accessibility concerns. Systematic participatory design, formal adaptive frameworks, and evidence-based standardization are central to the realization of universally interpretable, scalable, and safe AV–human communication in mixed traffic environments.