Demystifying FPGA Hard NoC Performance

Published 13 Mar 2025 in cs.AR | (2503.10861v1)

Abstract: With the advent of modern multi-chiplet FPGA architectures, vendors have begun integrating hardened NoC to address the scalability, resource usage, and frequency disadvantages of soft NoCs. However, as this work shows, effectively harnessing these hardened NoC is not trivial. It requires detailed knowledge of the microarchitecture and how it relates to the physical design of the FPGA. Existing literature has provided in-depth analyses for NoC in MPSoC devices, but few studies have systematically evaluated hardened NoC in FPGA, which have several unique implications. This work aims to bridge this knowledge gap by demystifying the performance and design trade-offs of hardened NoC on FPGA. Our work performs detailed performance analysis of hard (and soft) NoC under different settings, including diverse NoC topologies, routing strategies, traffic patterns and different external memories under various NoC placements. In the context of Versal FPGAs, our results show that using hardened NoC in multi-SLR designs can reduce expensive cross-SLR link usage by up to 30~40%, eliminate general-purpose logic overhead, and remove most critical paths caused by large on-chip crossbars. However, under certain aggressive traffic patterns, the frequency advantage of hardened NoC is outweighed by the inefficiency in the network microarchitecture. We also observe suboptimal solutions from the NoC compiler and distinct performance variations between the vertical and horizontal interconnects, underscoring the need for careful design. These findings serve as practical guidelines for effectively integrating hardened NoC and highlight important trade-offs for future FPGA-based systems.

Abstract PDF Upgrade to Chat

Summary

Overview of "Demystifying FPGA Hard NoC Performance"

The paper "Demystifying FPGA Hard NoC Performance," authored by Sihao Liu, Jake Ke, Tony Nowatzki, and Jason Cong, presents a comprehensive examination of hardened Network-on-Chip (NoC) technologies integrated within modern FPGA architectures, specifically Versal FPGAs. The authors methodically address the nuanced performance characteristics and design trade-offs inherent in utilizing hardened NoCs as opposed to their soft counterparts.

Key Findings and Contributions

The research highlights several critical insights into the functionality and application of hardened NoCs. Firstly, the paper underscores that while these NoCs offer significant advantages in reducing cross-SLR link usage by up to 30-40%, eliminating general-purpose logic overhead, and removing critical paths inherent in cross-bar architectures, they are subject to performance variability under different traffic patterns. Notably, the efficiency gains from hardened NoCs can be compromised by specific aggressive traffic patterns, where the bandwidth losses negate frequency advantages due to inefficiencies in the network microarchitecture.

The paper also identifies suboptimal performance outcomes from the NoC compiler, particularly in handling vertical and horizontal interconnect variations, suggesting a need for cautious design and deployment strategies. This cautionary note is reinforced by their findings on the significant impact of NoC routing configurations and source-destination proximities on performance outcomes, particularly in terms of read and write bandwidth and latencies.

Methodological Approach

The authors employ a rigorous methodology, involving an extensive set of benchmarks that capture varied NoC configurations, data movement paradigms, and QoS scenarios. They utilize AMD's Versal FPGA platforms as a testbed, leveraging its architectural integration of hardened NoCs to conduct empirical evaluations. The characterization spans distinct NoC placements—local, horizontal (HNoC), vertical (VNoC), and spread—each tested under differing crossbar sizes.

Quantitative evaluations explore the correlation between NoC placement strategies and their impact on FPGA resource utilization, frequency, and throughput under various traffic patterns such as nearest-neighbor, shift, tornado, reverse, uniform, and hotspot. The analysis extends to practical implementations with external memory interfaces—namely DRAM and HBM—shedding light on how NoC proximity to memory controllers can drastically influence effective bandwidth.

Implications and Future Directions

The implications of this work are multifaceted. Practically, the findings serve as guidelines for FPGA programmers aiming to harness the full potential of integrated NoCs. They emphasize strategic considerations in NoC placement and traffic pattern planning to maximize throughput while minimizing latency and resource overhead. On a theoretical level, the study prompts a reevaluation of NoC compiler techniques and microarchitectural design paradigms to enhance robustness and scalability.

Moving forward, advancements in NoC design for FPGAs could focus on resolving the bandwidth-loss challenges associated with complex traffic scenarios and improving the adaptability of NoC compilers. The development of more intelligent NoC routing algorithms and adaptive architectures could foster further efficiency and performance gains, especially as FPGA applications continue to scale in complexity and size.

In summary, this paper provides a detailed exploration of hardened NoC performance in FPGA systems, offering essential insights and practical recommendations for optimizing designs in real-world applications. The research bridges existing knowledge gaps and sets the stage for future innovations in FPGA-based NoC technology.