Multi-Modal Latent Variables for Cross-Individual Primary Visual Cortex Modeling and Analysis

Published 19 Dec 2024 in q-bio.NC | (2412.14536v1)

Abstract: Elucidating the functional mechanisms of the primary visual cortex (V1) remains a fundamental challenge in systems neuroscience. Current computational models face two critical limitations, namely the challenge of cross-modal integration between partial neural recordings and complex visual stimuli, and the inherent variability in neural characteristics across individuals, including differences in neuron populations and firing patterns. To address these challenges, we present a multi-modal identifiable variational autoencoder (miVAE) that employs a two-level disentanglement strategy to map neural activity and visual stimuli into a unified latent space. This framework enables robust identification of cross-modal correlations through refined latent space modeling. We complement this with a novel score-based attribution analysis that traces latent variables back to their origins in the source data space. Evaluation on a large-scale mouse V1 dataset demonstrates that our method achieves state-of-the-art performance in cross-individual latent representation and alignment, without requiring subject-specific fine-tuning, and exhibits improved performance with increasing data size. Significantly, our attribution algorithm successfully identifies distinct neuronal subpopulations characterized by unique temporal patterns and stimulus discrimination properties, while simultaneously revealing stimulus regions that show specific sensitivity to edge features and luminance variations. This scalable framework offers promising applications not only for advancing V1 research but also for broader investigations in neuroscience.

Abstract PDF HTML Upgrade to Chat

Authors (6)

Summary

The paper’s main contribution is introducing miVAE, a novel autoencoder architecture that disentangles latent variables to capture both individual-specific and shared neural features in V1.
It employs directed generative modeling and variational inference to robustly align multi-modal data, demonstrating enhanced cross-individual performance.
Experimental results show that miVAE outperforms baseline models, with ablation studies confirming the critical role of latent space design in modeling neural dynamics.

Introduction

This paper addresses the significant challenges in modeling the primary visual cortex (V1) across different individuals, particularly due to the inherent variability in neural characteristics and the integration of partial neural recordings with complex visual stimuli. The study introduces a multi-modal identifiable variational autoencoder (miVAE) designed to effectively map neural activity and visual stimuli into a unified latent space, enhancing cross-modal correlations and facilitating robust analysis without the need for subject-specific adjustments.

miVAE Architecture and Methodology

The core innovation of the paper lies in the architecture of the miVAE, which employs two-level latent space disentanglement. Neural activity is modeled with both idiosyncratic latent variables and preserved latent variables, capturing individual-specific details and consistent information across individuals. Visual stimuli are differentiated into activity-related and unrelated variables. This separation refines the latent space, optimizing the correlation between neural and visual inputs.

Figure 1: miVAE for modeling V1 neural activity $\mathbf{x}$ and corresponding visual stimuli $\mathbf{y}$ .

The cross-modal approach uses directed generative modeling to capture stimulus-evoked neural dynamics, reinforcing the correlation between neural activity and visual stimuli within the latent space. Variational inference aids in approximating intractable posteriors, optimizing the variational evidence lower bound (ELBO) across dual domains for effective learning.

Neural Coding and Attribution Analysis

The study validates miVAE using a large-scale mouse V1 dataset, demonstrating superior performance in cross-individual latent representation. Encoding and decoding models show remarkable scalability with increasing dataset sizes, while score-based attribution analysis uncovers distinct neuronal subpopulations and stimulus-sensitive regions.

Figure 2: Neural coding in the latent space with following score-based Attribution Analysis.

The attribution strategy utilizes a score function, treating neural encoders as decoders, to relate high-level latent representations back to original data, identifying significant responses in both neural populations and stimulus features.

Experimental Results and Ablation Studies

The miVAE system consistently outperforms baseline models, illustrating its effectiveness in multi-modal and cross-modal latent alignment without subject-specific training. Ablation studies reveal the critical roles of latent variables in capturing cross-individual variations, and dataset scalability further enhances model performance, validating its applicability to broader neuroscience research.

Figure 3: Ablations on modeling methods, showcasing the impact of loss functions and latent variable configurations.

Figure 4: Ablations studies on data scale, showing the relationship between dataset size and model performance.

Conclusion

The miVAE framework presents a robust methodological advancement for cross-individual modeling of V1, facilitating detailed neural and stimulus analysis. Its scalable and interpretable architecture holds promise for broader applications in sensory cortex research, potentially extending beyond visual to auditory systems, enhancing our understanding of complex neural dynamics and cross-sensory integration.

Markdown Report Issue