Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Robust, Open-Source Framework for Spiking Neural Networks on Low-End FPGAs

Published 9 Jul 2025 in cs.NE | (2507.07284v2)

Abstract: As the demand for compute power in traditional neural networks has increased significantly, spiking neural networks (SNNs) have emerged as a potential solution to increasingly power-hungry neural networks. By operating on 0/1 spikes emitted by neurons instead of arithmetic multiply-and-accumulate operations, SNNs propagate information temporally and spatially, allowing for more efficient compute power. To this end, many architectures for accelerating and simulating SNNs have been developed, including Loihi, TrueNorth, and SpiNNaker. However, these chips are largely inaccessible to the wider community. Field programmable gate arrays (FPGAs) have been explored to serve as a middle ground between neuromorphic and non-neuromorphic hardware, but many proposed architectures require expensive high-end FPGAs or target a single SNN topology. This paper presents a framework consisting of a robust SNN acceleration architecture and a Pytorch-based SNN model compiler. Targeting any-to-any and/or fully connected SNNs, the FPGA architecture features a synaptic array that tiles across the SNN to propagate spikes. The architecture targets low-end FPGAs and requires very little (6358 LUT, 40.5 BRAM) resources. The framework, tested on a low-end Xilinx Artix-7 FPGA at 100 MHz, achieves competitive speed in recognizing MNIST digits (0.52 ms/img). Further experiments also show accurate simulation of hand coded any-to-any spiking neural networks on toy problems. All code and setup instructions are available at https://github.com/im-afan/snn-fpga}{\texttt{https://github.com/im-afan/snn-fpga.

Summary

  • The paper presents a resource-efficient FPGA framework that accelerates spiking neural networks on low-end devices, enabling rapid deployment in edge AI applications.
  • The methodology features a novel tiling and crossbar design, using only 6358 LUTs and 40.5 BRAM, to optimize compute and minimize hardware resource usage.
  • Key experiments, including MNIST inference at 0.52ms per image and 0.198 mJ energy per image, validate the framework's practical efficiency and scalability.

A Robust, Open-Source Framework for Spiking Neural Networks on Low-End FPGAs

Introduction

This work presents a resource-efficient, highly adaptable framework for the acceleration of spiking neural networks (SNNs) on low-end field-programmable gate arrays (FPGAs) ["A Robust, Open-Source Framework for Spiking Neural Networks on Low-End FPGAs", (2507.07284)]. Unlike previous approaches targeting costly high-end devices or requiring application-specific SNN topologies, the architecture achieves a significant balance between flexibility and resource utilization. Targeting fully connected and any-to-any SNNs, the framework supports 8-bit quantized weights and optimizes compute through a novel tiling and crossbar methodology that can be realized efficiently even on commodity FPGA devices.

Architectural Approach

The architecture is partitioned into three principal modules: the synaptic crossbar, leaky-integrate-and-fire (LIF) neuron units, and a centralized control unit. Trained neural networks are decomposed into "tiles," each encapsulating a block of 16 pre-synaptic and 16 post-synaptic neurons (up to 256 synapses). The FPGA architecture synchronously streams spike states, weights, and membrane potentials through BRAM-resident tiles, orchestrated by the control unit. A pipelined synaptic array processes presynaptic spikes and weights, propagating synaptic inputs to a set of IF units that implement the integrate-and-fire function in pure integer arithmetic without a leak factor, enhancing hardware simplicity. Figure 1

Figure 1

Figure 1: The SNN accelerator architecture consists of IF neuron units, a centralized control unit, and a synaptic crossbar for parallel spike-weight accumulation.

The crossbar employs a 16×16 weight matrix and processes binary spike activations through adder trees, yielding 16 summed currents each timestep. Input values are injected directly into neuron membrane potentials. Synchronization of data movement is maintained using FIFOs to ensure proper sequencing, especially for tile index transitions. Figure 2

Figure 2

Figure 2: The synaptic array computes weighted accumulations for 0/1 spike activations using minimal delay and resource overhead.

Unlike prior work (e.g., "FireFly" [FireFly, IEEE TVLSI 2023] and "SyncNN" [SyncNN, FPL 2021]) focused on large FPGAs or convolutional SNNs with high resource budgets, this design requires only 6358 LUTs and 40.5 BRAMs for SNN logic, enabling deployment on resource-constrained FPGAs like the Xilinx Artix-7.

Software and Compilation Pipeline

A PyTorch-based training interface is provided via the SNNTorch library, employing surrogate gradient learning and backpropagation through time with no-leak IF neurons and a synaptic delay of one. Trained models are quantized to int8 and compiled into hardware-ready tiles. The compilation output supports direct deployment via C arrays or FPGA memory initialization files, allowing seamless integration with the on-chip processor.

Experimental Results

Fully-Connected SNN on MNIST

The framework was evaluated by training and deploying a 784-128-10 SNN on the MNIST dataset, quantized to 8-bit weights and executed at 100 MHz on a Digilent Basys3 FPGA. The design achieves 0.52 ms per image inference, consuming 0.381 W, corresponding to 0.198 mJ per image or 1.95 nJ per synaptic operation. Notably, these results are achieved using only 6358 LUTs and 40.5 BRAM for SNN logic—demonstrating competitive efficiency in resource-constrained environments.

Any-to-Any SNN and General Logic

To verify architectural generality, hand-coded any-to-any SNNs were deployed for temporal pattern problems, such as spike interval parity encoding. The recurrent architecture operating over multiple tiles correctly computes the required logical function, confirming robust support for arbitrary connectivity. Figure 3

Figure 3: The RSNN architecture implements parity computation by chaining neuron and logic gate motifs across tiles, utilizing the tiling strategy to support non-standard connectivity patterns.

Implications and Limitations

This work directly addresses accessibility in neuromorphic computing, providing an open-source pipeline deployable on low-cost, low-power hardware. The architecture fundamentally broadens the class of SNNs deployable on edge devices, embedded robotics, and IoT applications, circumventing the cost and inflexibility of ASIC-based offerings (e.g., Loihi [Loihi, IEEE Micro 2018], TrueNorth [TrueNorth, IEEE TCAD 2015]).

By decoupling SNN deployment from architectural rigidness and high resource requirements, the framework facilitates:

  • Experimentation with novel network topologies—including hand-designed logic circuits and bio-inspired architectures
  • Rapid deployment of quantized, PyTorch-trained SNNs with minimal platform porting overhead
  • Scalability to larger SNNs on platforms with more BRAM and logic, or adaptation to unconstrained embedded FPGAs

A core constraint is synchronous operation; the pipeline does not exploit idle time during non-spiking activity, which diminishes energy efficiency at large scale relative to event-driven accelerators. Furthermore, support for automatic training of arbitrary any-to-any SNNs is limited by the lack of robust training algorithms for highly recurrent or irregular architectures—though hand-crafted solutions are supported.

Future Directions

Potential enhancements include asynchronous event-driven computation to exploit input sparsity, improved training support for arbitrary connectivity (including convolutional structures), and scaling to higher-end or alternative embedded FPGAs with expanded RAM and logic resources. Integrating advanced coding schemes or compressed dataflows could yield further efficiency gains for large SNN deployments.

Conclusion

This framework represents a significant contribution toward democratizing spiking deep learning architectures on budget- and power-constrained hardware. By fusing a minimal FPGA architecture with open-source training and compilation tools, it lowers barriers for advanced neuromorphic and edge inference research. The demonstrated flexibility and low resource footprint position this approach as an effective platform for future SNN algorithmic exploration, hardware/software codesign, and real-world low-power AI tasks.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 0 likes about this paper.