Simulated Prosthetic Vision: Methods & Progress

Updated 31 January 2026

Simulated Prosthetic Vision (SPV) is a computational framework that models visual percepts using biologically realistic phosphene simulation and immersive VR environments.
It integrates experimental protocols with deep learning to optimize device design by controlling key parameters like field of view, phosphene density, and temporal dynamics.
SPV provides actionable insights for neuroprosthetic development through benchmarking perceptual outcomes and supporting translational research in real-world tasks.

Simulated Prosthetic Vision (SPV) refers to computational frameworks, experimental protocols, and psychophysically validated pipelines developed to model, predict, and optimize the visual percepts experienced by users of retinal, cortical, or thalamic visual neuroprostheses. SPV plays a pivotal role in guiding device design, benchmarking perceptual outcomes, and supporting translational research from algorithmic innovation to clinical evaluation. Core elements of SPV include biologically realistic phosphene modeling, parameterized control over field of view, phosphene density, and spatial/temporal characteristics, as well as the integration of deep computer vision and machine learning for perceptual optimization.

1. Biophysical Modeling of Prosthetic Vision Percepts

The canonical SPV pipeline is structured around biophysical simulation of phosphenes—the discrete, often distorted light percepts evoked by electrical stimulation of neural tissue. Spatial models utilize multi-parameter kernels that account for the geometry of the implant (retinal epiretinal, subretinal, or cortical substrate), the spatial spread (Gaussian or exponential), and tissue-specific anisotropies, notably axonal elongation induced by stimulation of retinal ganglion axon bundles. For each electrode $i$ , the perceptual contribution at $(x, y)$ is typically modeled as:

$A_i(x, y) = I_i\,\exp\left(-\frac{d_i(x, y)}{\rho}\right)\exp\left(-\frac{\ell_i(x, y)}{\lambda}\right)$

where $I_i$ is the stimulation current, $d_i(x, y)$ is the Euclidean distance to electrode $i$ , and $\ell_i(x, y)$ is the axonal path length. Parameters $\rho$ and $\lambda$ determine phosphene diameter and elongation, respectively (Han et al., 2021).

Temporal dynamics of phosphenes—fading, persistence, and rebound—are modeled using either leaky integrator systems or more advanced frameworks such as the spectral truncated Fourier model or multi-stage non-linear decay models. For instance, under the spectral model:

$I(t)= \begin{cases} k_1\,(t-t_1) & t\in[0,\,t_1)\ \text{linear connector} & t\in[t_1,\,t_2)\ \mathcal{F}^{-1}(X_m)+k_2 & t\in[t_2,\,t_3)\ 0 & t\ge t_3 \end{cases}$

with $X_m$ denoting the $m$ largest components of the DFT of empirical data (Hou et al., 2024). Physiologically realistic SPV thus necessitates both an accurate spatial substrate and the inclusion of the empirically observed non-linear, adaptive, and flicker-prone dynamics of prosthetic vision.

2. System Architecture and Virtual Reality Integration

Modern SPV systems are predominantly realized in immersive virtual reality (VR) environments, leveraging high-end head-mounted displays (HMDs), precise head and eye tracking, and low-latency rendering pipelines. These systems reproduce the visual field constraints and the necessity for active head or eye scanning imposed by prosthesis FOV and electrode layout. For example, Sanchez-Garcia et al. used the HTC VIVE Pro with real-time C++/OpenVR and OpenCV integration to map camera frames to a quantized phosphene grid within software-defined FOVs (10°, 20°, or 50°) (Sanchez-Garcia et al., 2022).

Phosphene maps are rendered at high refresh rates (e.g., 90 Hz), simulating not only the view through the prosthesis but also the psychophysics of gaze-contingent rendering and spatial smoothing consistent with clinical observations (Kasowski et al., 2022). VR-based SPV enables controlled behavioral testing—object search, recognition, navigation—under realistic, adjustable perceptual constraints (Sanchez-Garcia et al., 28 Jan 2025).

3. Parameterization: Field of View, Density, and Performance Metrics

SPV studies formally define and independently control key design parameters:

Field of View (FOV): The angular span of the phosphene array, varied from 10° to 50°. Narrowing the FOV for a given electrode count increases angular resolution at the cost of peripheral context.
Phosphene Count and Density ( $N$ , $\rho$ ): The absolute number of electrodes ( $N$ ), and their density $\rho = N/(FOV)^2$ (deg $^{-2}$ ), dictate small-target discrimination and overall acuity (Sanchez-Garcia et al., 2022). For mid-range FOV ( $\sim$ 20°), $N$ in the 500–1000 range supports logMAR ≈ 1.3–1.4 (Snellen 20/400–20/500); performance improvements diminish rapidly if the FOV is reduced further or electrode count is pushed above this range.
Visual Acuity Metrics: Acuity is measured using standardized paradigms (e.g., Landolt-C gap orientation) and scored in logMAR units: $\log_{10}(g/5')$ where $g$ is the minimal resolvable gap in arcminutes. Acuity gains logarithmically with $N$ for FOV $\geq 20^\circ$ but saturate below this or above $\sim$ 1000 phosphenes (Sanchez-Garcia et al., 2022). Regression fits yield, for FOV $\geq 20^\circ$ ,

$\mathrm{logMAR} = \alpha - \beta \log_{10} N, \quad \beta \sim 0.2$

Motion/position localization, RTs, and discrimination tasks are also standard assessment metrics (Sanchez-Garcia et al., 28 Jan 2025).

4. Algorithmic Advances: Deep Learning and Semantic Encoding

Recent SPV pipelines incorporate deep learning both to preprocess camera input and to optimize perceptual encoding for downstream utility:

End-to-End Differentiable Optimization: Wu et al. realized a fully differentiable pipeline (encoder $\to$ implant model $\to$ proxy observer), enabling gradient-based optimization of stimulation patterns for recognition accuracy rather than mere image fidelity (Wu et al., 2023). For a 6×10 electrode grid, the U-Net–based encoder improved the weighted F₁ score by 36.17% compared to naive downsampling, demonstrating the substantial advantage of data-driven encoding.
Semantic Scene Simplification: Scene simplification strategies—object segmentation, semantic edge detection, and attention-based selection—are critical for enhancing interpretability under bandwidth constraints. Semantic segmentation outperforms both pure saliency and depth-based models in object-presence detection (F1 = 0.68 for segmentation vs 0.46 for saliency) (Han et al., 2021, Sanchez-Garcia et al., 2018). Integration of user-centered design, as in gaze-guided segmentation overlays using SAM, increases object identification accuracy over edge maps, especially in cluttered scenes (Papadopoulos et al., 24 Sep 2025).
Fixation-Based Stimulation: Emulating saccadic fixation via self-attention maps (e.g., DINOv2 ViT) and encoding only the most salient patches allows a 14×14 grid to reach ≈91% classification accuracy, matching healthy controls (92.76%) and surpassing downsampling-based approaches by >40% (Wu et al., 2024).
Symbol Optimization for Reading: In low-bandwidth letter reading, sequential symbol presentation is distorted by spatial blur and temporal persistence (MixUp). Re-mapping letters to a heterogeneous, less confusable codebook yields a 21.6× reduction in predicted inter-symbol confusion compared to standard alphabets (Lesner et al., 24 Jan 2026).

5. Functional Task Evaluation and Behavioral Outcomes

SPV platforms support the controlled investigation of task-specific prosthetic performance:

Navigation and Mobility: Augmented reality overlays (paths, goals) in SPV systems using classical A* and DWA planners significantly reduced completion times, path length, and collision rate: AR-guided users (RoboticG) halved goal acquisition times and reduced collisions by >80% compared to unaugmented pure SPV (Sanchez-Garcia et al., 2021). Depth cues, appropriately mapped to phosphene brightness, outperformed edge enhancement for obstacle avoidance, supporting a collision-free navigation rate of ≈88–90% vs. ≈65% for edge-only modes (Rasla et al., 2022).
Scene Understanding and Wayfinding: Static and temporally multiplexed semantic overlays (SemanticEdges and SemanticRaster) both improve wayfinding in cluttered scenes. SemanticEdges maximize global scene awareness (task success rate 86% vs. baseline 78%), while SemanticRaster reduces instantaneous visual clutter, leading to fewer collisions (collision-free completions 61% vs. baseline 45%) (Kasowski et al., 14 Jul 2025).
Face Perception: For subretinal systems such as PRIMA, non-pixelized simulation algorithms integrating a grayscale filter, spatial low-pass filtering, and contrast-reducing tone curves accurately reproduce the clinical acuity and subjective quality of face perception in patients (resolving minimum Landolt-C gap of 0.8 px, matching ~20/417) (Park et al., 1 Mar 2025). ML-based facial landmark extraction and line-thickening, as well as pre-compensation via inverse tone curves, restore identifiable facial features and improve the semantic utility of the prosthetic percept.

6. Design Principles and Constraints for Prosthetic System Development

Convergent findings from SPV studies inform practical prosthesis design principles:

Field of View and Density Trade-off: An FOV of 20–30° with ≥15 phosphenes/degree supports optimal search and recognition tasks. Widening FOV decreases per-phosphene angular resolution; excess peripheral coverage with fixed electrode budget degrades fine discrimination (Sanchez-Garcia et al., 2022, Sanchez-Garcia et al., 28 Jan 2025). Concentrating phosphenes centrally, possibly with foveal clustering, is consistently beneficial.
Electrode Count vs. Information Bottleneck: Performance gains saturate above ~1000 electrodes for mid-range fields, and limited dynamic range (e.g., 8–14 gray levels) and spatial low-pass filtering remain dominant perceptual bottlenecks even with higher spatial resolution (Sanchez-Garcia et al., 2022, Park et al., 1 Mar 2025).
Safety-Compliant Raster Scheduling: Maximum concurrent activation of 20% of electrodes (to respect charge safety limits) is best achieved by structured checkerboard raster patterns, preserving perceptual accuracy and minimizing directionally biased artifacts (Kasowski et al., 3 Jan 2025).
Semantic and Fixation-Aware Encoding: Dynamic, task-adaptive, or user-driven semantic overlays (e.g., gaze-guided semantic mask selection) and time-multiplexing of relevant object classes enable content-aware allocation of limited perceptual bandwidth, managing clutter and maintaining safety constraints (Kasowski et al., 14 Jul 2025, Papadopoulos et al., 24 Sep 2025).
Biological Realism and Clinical Validity: Anatomically and psychophysically grounded phosphene models (axon-map, patient-specific $\rho, \lambda$ ) are needed for accurate prediction and optimization. Overly simplistic “scoreboard” or dot models overestimate achievable performance (Kasowski et al., 2022, Han et al., 2021).

7. Outlook and Current Limitations

SPV remains central to iterative visual prosthesis development, enabling rapid prototyping, user-centric algorithm evaluation, and large-scale optimization before deployment in clinical settings. Persistent challenges include modeling patient-specific perceptual distortions, quantifying long-term adaptation and learning, and integrating multi-sensory cues for scene understanding and navigation. Future directions include adaptive, closed-loop encoder optimization, hardware-in-the-loop validation, and expanding real-world SPV deployments in ecologically complex, time-varying environments (Wu et al., 2023, Neogi et al., 2011).

SPV is foundational not only for device engineering but as a platform for neuroscientific investigation, closed-loop control algorithms, and for engaging low-vision users in the iterative co-design of next-generation bionic vision systems.