Hierarchical Predictive Coding Models in a Deep-Learning Framework

This presentation explores how predictive coding—a theory of how brains predict and learn from sensory input—can be implemented using modern deep learning architectures. We examine the classical Rao-Ballard model, its Bayesian foundations, and how newer implementations like PredNet leverage convolutional LSTM networks to predict video frames. The talk highlights the tension between biological plausibility and computational scalability, showing how deep learning frameworks offer new pathways for building hierarchical models that learn by minimizing prediction errors.
Script
What if your brain is constantly predicting what it's about to see, and learning only happens when those predictions fail? Hierarchical predictive coding suggests exactly that: the brain builds layered models that forecast sensory input, with surprise driving all learning.
In predictive coding, each layer in a hierarchy tries to guess what the layer below will see next. When predictions miss the mark, the resulting errors climb upward, forcing higher layers to adjust their internal models. This cycle of prediction, error, and correction mirrors computational processes hypothesized to occur in the neocortex.
The foundation for these ideas comes from a model that paired prediction with error in a surprisingly elegant way.
The Rao-Ballard model introduces the predictive element, a modular unit containing both representation neurons and error neurons. Each element sits in a layer of a hierarchy, where top-down connections send predictions downward and bottom-up connections carry errors upward. This architecture elegantly captures how context shapes perception: higher layers encode abstract patterns, and their predictions gate what lower layers report as surprising. The model even reproduces neural phenomena like end-stopping, where cells respond to bar stimuli only when the bar terminates within their receptive field.
Predictive coding has roots in Bayesian inference, where each layer attempts to explain the causes behind sensory data. The framework combines learned priors with incoming signals to form hierarchical representations of the world. Traditional Bayesian calculations are computationally intense, but modern deep learning frameworks sacrifice exact probabilistic operations in exchange for the ability to scale these models to real-world complexity.
Translating these principles into trainable neural networks required rethinking connectivity and computation.
PredNet reimagines predictive coding using convolutional Long Short-Term Memory networks to predict video sequences. Each layer forecasts the next frame based on current input and its own recurrent memory, generating prediction errors that guide updates across the hierarchy. Unlike the strict Rao-Ballard protocol, PredNet's connectivity is tailored for deep learning efficiency, showing that you can preserve the spirit of predictive coding while adapting the architecture for modern hardware and training regimes. The model performs competitively on next-frame prediction tasks, demonstrating that hierarchical error minimization scales beyond toy problems.
The shift from classical to deep learning implementations reveals a deliberate trade-off. Traditional predictive coding models honor Bayesian rigor and biological detail, but they struggle with scale and speed. Deep learning versions like PredNet sacrifice exact probabilistic guarantees, yet they unlock the ability to process high-dimensional sensory streams like video. This tension between explanatory fidelity and practical capability defines the current frontier.
Despite progress, fundamental questions remain unanswered. We don't yet know how tightly deep predictive models align with actual neural dynamics, nor whether their hierarchical structure generalizes across different sensory modalities. Understanding how prediction errors and representations interact across layers will be crucial for building models that are both computationally effective and biologically grounded.
Predictive coding offers a compelling theory of how brains anticipate the world, and deep learning gives us the tools to test it at scale. Visit EmergentMind.com to explore more research and create your own video presentations.