Papers
Topics
Authors
Recent
Search
2000 character limit reached

Swordfish: A Framework for Evaluating Deep Neural Network-based Basecalling using Computation-In-Memory with Non-Ideal Memristors

Published 6 Oct 2023 in cs.AR, cs.ET, and q-bio.GN | (2310.04366v2)

Abstract: Basecalling, an essential step in many genome analysis studies, relies on large Deep Neural Networks (DNNs) to achieve high accuracy. Unfortunately, these DNNs are computationally slow and inefficient, leading to considerable delays and resource constraints in the sequence analysis process. A Computation-In-Memory (CIM) architecture using memristors can significantly accelerate the performance of DNNs. However, inherent device non-idealities and architectural limitations of such designs can greatly degrade the basecalling accuracy, which is critical for accurate genome analysis. To facilitate the adoption of memristor-based CIM designs for basecalling, it is important to (1) conduct a comprehensive analysis of potential CIM architectures and (2) develop effective strategies for mitigating the possible adverse effects of inherent device non-idealities and architectural limitations. This paper proposes Swordfish, a novel hardware/software co-design framework that can effectively address the two aforementioned issues. Swordfish incorporates seven circuit and device restrictions or non-idealities from characterized real memristor-based chips. Swordfish leverages various hardware/software co-design solutions to mitigate the basecalling accuracy loss due to such non-idealities. To demonstrate the effectiveness of Swordfish, we take Bonito, the state-of-the-art (i.e., accurate and fast), open-source basecaller as a case study. Our experimental results using Sword-fish show that a CIM architecture can realistically accelerate Bonito for a wide range of real datasets by an average of 25.7x, with an accuracy loss of 6.01%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (149)
  1. Compute Caches. In HPCA. 2017.
  2. A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing. In ISCA. 2015a.
  3. PIM-enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture. In ISCA. 2015b.
  4. High Precision Tuning of State for Memristive Devices by Adaptable Variation-Tolerant Algorithm. Nanotechnology. 2012.
  5. Personalized Copy Number and Segmental Duplication Maps Using Next-Generation Sequencing. Nature Genetics. 2009.
  6. From Molecules to Genomic Variations: Accelerating Genome Analysis via Intelligent Algorithms and Architectures. Computational and Structural Biotechnology Journal. 2022.
  7. Next Generation Sequencing: An Application in Forensic Sciences? Annals of Human Biology. 2017.
  8. AMD. AMD®®{}^{\tiny{\textregistered}}start_FLOATSUPERSCRIPT ® end_FLOATSUPERSCRIPT EPYC®®{}^{\tiny{\textregistered}}start_FLOATSUPERSCRIPT ® end_FLOATSUPERSCRIPT 7742 CPU. https://www.amd.com/en/products/cpu/amd-epyc-7742.
  9. PUMA: A Programmable Ultra-Efficient Memristor-Based Accelerator for Machine Learning Inference. In ASPLOS. 2019.
  10. Ankit, Aayush. PUMA Compiler. https://github.com/Aayush-Ankit/puma-compiler.
  11. PUMA Functional Simulator. https://github.com/Aayush-Ankit/puma-functional-model.
  12. Single Molecule Real-Time (SMRT) Sequencing Comes of Age: Applications and Utilities for Medical Diagnostics. Nucleic Acids Research. 2018.
  13. Moving Genomics to Routine Care: An Initial Pilot in Acute Cardiovascular Disease. Circulation: Genomic and Precision Medicine. 2020.
  14. Euan A Ashley. Towards Precision Medicine. Nature Reviews Genetics. 2016.
  15. Massively Scaled-Up Testing for SARS-CoV-2 RNA via Next-Generation Sequencing of Pooled and Barcoded Nasal and Saliva Samples. Nature Biomedical Engineering. 2021.
  16. Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks. In PACT. 2021.
  17. Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks. In ASPLOS. 2018.
  18. The Potential and Challenges of Nanopore Sequencing. Nature Biotechnology. 2008.
  19. Neuromorphic Computing Using Non-Volatile Memory. Advances in Physics: X. 2017.
  20. GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis. In MICRO. 2020.
  21. Low-Cost Inter-Linked Subarrays (LISA): Enabling Fast Inter-Subarray Data Movement in DRAM. In HPCA. 2016.
  22. Accurate Inference With Inaccurate RRAM Devices: A Joint Algorithm-Design Solution. JXCDC. 2020.
  23. RRAM Defect Modeling and Failure Analysis Based on March Test and a Novel Squeeze-Search Scheme. IEEE Transactions on Computers. 2014.
  24. Accelerator-Friendly Neural-Network Training: Learning Variations and Defects in RRAM Crossbar. In DATE. 2017a.
  25. Accelerator-Friendly Neural-Network Training: Learning Variations and Defects in RRAM Crossbar. In DATE. 2017b.
  26. Functional Demonstration of a Memristive Arithmetic Logic Unit (MemALU) for In-Memory Computing. Advanced Functional Materials. 2019.
  27. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. ISCA. 2016.
  28. Cancer Genomics: From Discovery Science to Personalized Medicine. Nature Medicine. 2011.
  29. Diagnosis of Genetic Diseases in Seriously Ill Children by Rapid Whole-Genome Sequencing and Automated Phenotyping and Interpretation. Science Translational Medicine. 2019.
  30. BinaryConnect: Training Deep Neural Networks With Binary Weights During Propagations. NeurIPS. 2015.
  31. Artificial Intelligence in Clinical and Genomic Diagnostics. Genome Medicine. 2019.
  32. LightNN: Filling the Gap Between Conventional Deep Neural Networks and Binarized Networks. In GLSVLSI. 2017.
  33. SquiggleFilter: An Accelerator for Portable Virus Detection. In MICRO. 2021.
  34. Hans Ellegren. Genome Sequencing and Population Genomics in Non-Model Organisms. Trends in Ecology & Evolution. 2014.
  35. Learned Step Size Quantization. arXiv. 2019.
  36. pLUTo: In-DRAM Lookup Tables to Enable Massively Parallel General-Purpose Computation. MICRO. 2022.
  37. Mitigating the Effects of RRAM Process Variation on the Accuracy of Artificial Neural Networks. In International Conference on Embedded Computer Systems. 2022.
  38. Simulating Large Neural Networks Embedding MLC RRAM as Weight Storage Considering Device Variations. In LASCAS. 2021.
  39. GenAx: A Genome Sequencing Accelerator. In ISCA. 2018.
  40. ParaBit: Processing Parallel Bitwise Operations in NAND Flash Memory based SSDs. In MICRO. 2021.
  41. Precision Medicine: From Science to Value. Health Affairs. 2018.
  42. Genomic and Personalized Medicine: Foundations and Applications. Translational Research. 2009.
  43. An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System. arXiv preprint arXiv:2207.07886. 2022.
  44. Deep Learning With Limited Numerical Precision. In ICML. 2015.
  45. Memristor Based Computation-In-Memory Architecture for Data-Intensive Applications. In DATE. 2015.
  46. Deep Compression: Compressing Deep Neural Networks With Pruning, Trained Quantization and Huffman Coding. ICLR. 2015.
  47. Distilling the Knowledge in a Neural Network. arXiv. 2015.
  48. Memristor-Based Analog Computation and Neural Network Classification With a Dot Product Engine. Advanced Materials. 2018.
  49. BSB Training Scheme Implementation on Memristor-Based Circuit. In CISDA. 2013.
  50. SACall: A Neural Network Basecaller for Oxford Nanopore Sequencing Data Based on Self-Attention Mechanism. TCBB. 2020.
  51. QUIDAM: A Framework for Quantization DNN Accelerator and Model Co-Exploration. TECS. 2022.
  52. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In CVPR. 2018.
  53. Nanopore Sequencing and Assembly of a Human Genome With Ultra-Long Reads. Nature Biotechnology. 2018.
  54. CxDNN: Hardware-Software Compensation Methods for Deep Neural Networks on Resistive Crossbar Systems. TECS. 2019.
  55. RxNN: A Framework for Evaluating Deep Neural Networks on Resistive Crossbars. TCAD. 2020.
  56. Parasitic Effect Analysis in Memristor-Array-Based Neuromorphic Systems. IEEE Transactions on Nanotechnology. 2017.
  57. Device-Circuit-Architecture Co-Exploration for Computing-in-Memory Neural Accelerators. TC. 2020.
  58. Nucleation Switching in Phase Change Memory. Applied Physics Letters. 2007.
  59. In-Memory Hyperdimensional Computing. Nature Electronics. 2020.
  60. Generations of Sequencing Technologies: From First to Next Generation. Biology and Medicine. 2017.
  61. Multistate Memristive Tantalum Oxide Devices for Ternary Arithmetic. Scientific Reports. 2016.
  62. A Genome Sequencing System for Universal Newborn Screening, Diagnosis, and Precision Medicine for Severe Genetic Diseases. The American Journal of Human Genetics. 2022.
  63. Improving Noise Tolerance of Mixed-Signal Neural Networks. In IJCNN. 2019.
  64. An End-to-End Deep Neural Network for Autonomous Driving Designed for Embedded Automotive Platforms. Sensors. 2019.
  65. Halcyon: An Accurate Basecaller Exploiting an Encoder–Decoder Model With Monotonic Attention. Bioinformatics. 2021.
  66. EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network Inference Using Approximate DRAM. In MICRO. 2019.
  67. Vien Thi Minh Le and Binh An Diep. Selected Insights From Application of Whole Genome Sequencing for Outbreak Investigations. Current Opinion in Critical Care. 2013.
  68. Architecting Phase Change Memory as a Scalable DRAM Alternative. In ISCA. 2009.
  69. Phase Change Memory Architecture and the Quest for Scalability. CACM. 2010.
  70. Exploring Cycle-to-Cycle and Device-to-Device Variation Tolerance in MLC Storage-Based Neural Network Training. IEEE Transactions on Electron Devices. 2019.
  71. Heng Li. Minimap and Miniasm: Fast Mapping and De Novo Assembly for Noisy Long Sequences. Bioinformatics. 2016.
  72. DRISA: A DRAM-Based Reconfigurable In-Situ Accelerator. In MICRO. 2017.
  73. Pinatubo: A Processing-in-Memory Architecture for Bulk Bitwise Operations in Emerging Non-Volatile Memories. In DAC. 2016.
  74. Additive Powers-of-Two Quantization: An Efficient Non-Uniform Discretization for Neural Networks. arXiv. 2019.
  75. Testing DNN-based Autonomous Driving Systems under Critical Environmental Conditions. In ICML. 2021.
  76. DL-RSIM: A Simulation Framework to Enable Reliable ReRAM-Based Accelerators for Deep Learning. In ICCAD. 2018.
  77. Reduction and IR-drop Compensations Techniques for Reliable Neuromorphic Computing Systems. In ICCAD. 2014.
  78. Vortex: Variation-Aware Training for Memristor X-Bar. In DAC. 2015.
  79. Processing-in-Memory for Energy-Efficient Neural Network Training: A Heterogeneous Approach. In MICRO. 2018.
  80. Design of Reliable DNN Accelerator With Un-Reliable ReRAM. In DATE. 2019.
  81. Helix: Algorithm/Architecture Co-Design for Accelerating Nanopore Genome Base-Calling. In PACT. 2020.
  82. MNEMOSENE partners. The MNEMOSENE Project. http://www.mnemosene.eu/.
  83. Low-Current-Density Magnetic Tunnel Junctions for STT-RAM Application Using MgO x𝑥{}_{{x}}start_FLOATSUBSCRIPT italic_x end_FLOATSUBSCRIPT N (x=0.571−x{}_{\text{1}-{x}}\,\,({x}=\text{0.57}start_FLOATSUBSCRIPT 1 - italic_x end_FLOATSUBSCRIPT ( italic_x = 0.57 ) Tunnel Barrier. IEEE Transactions on Electron Devices. 2020.
  84. Accelerating Genome Analysis via Algorithm-Architecture Co-Design. In DAC. 2023.
  85. A Modern Primer on Processing in Memory. In Emerging Computing: From Devices to Systems. 2023.
  86. An Energy-Efficient Digital ReRAM-Crossbar-Based CNN With Bitwise Parallelism. JXCDC. 2017.
  87. Whole Genome Sequencing of Mycobacterium Tuberculosis for Detection of Recent Transmission and Tracing Outbreaks: A Systematic Review. Tuberculosis. 2016.
  88. NVIDIA. NVIDIA V100. https://www.nvidia.com/en-us/data-center/v100/.
  89. TransPimLib: Efficient Transcendental Functions for Processing-in-Memory Systems. In ISPASS. 2023.
  90. Oxford Nanopore Technologies Ltd. Developers. GridION. https://nanoporetech.com/products/gridion. 2017a.
  91. Oxford Nanopore Technologies Ltd. Developers. Metrichor. https://metrichor.com. 2017b.
  92. Oxford Nanopore Technologies Ltd. Developers. Flappie. https://github.com/nanoporetech/flappie. 2018a.
  93. Oxford Nanopore Technologies Ltd. Developers. PromethION. https://nanoporetech.com/products/promethion-2. 2018b.
  94. Oxford Nanopore Technologies Ltd. Developers. MinION. https://nanoporetech.com/products/minion. 2019a.
  95. Oxford Nanopore Technologies Ltd. Developers. Scrappie. https://github.com/nanoporetech/scrappie. 2019b.
  96. Oxford Nanopore Technologies Ltd. Developers. Bonito. https://github.com/nanoporetech/bonito. 2020.
  97. Oxford Nanopore Technologies Ltd. Developers. Dorado. https://github.com/nanoporetech/dorado. 2022.
  98. Marc Pages-Gallego and Jeroen de Ridder. Comprehensive and Standardized Benchmarking of Deep Learning Architectures for Basecalling Nanopore Sequencing Data. bioRxiv. 2022.
  99. Flash-Cosmos: In-Flash Bulk Bitwise Operations Using Inherent Computation Capability of NAND Flash Memory. In MICRO. 2022.
  100. Conductance Variations and Their Impact on the Precision of In-Memory Computing With Resistive Switching Memory (RRAM). In IRPS. 2021.
  101. Training and Operation of an Integrated Neuromorphic Network Based on Metal-Oxide Memristors. Nature. 2015.
  102. Binary Neural Networks: A Survey. Pattern Recognition. 2020.
  103. Real-Time, Portable Genome Sequencing for Ebola Surveillance. Nature. 2016.
  104. Measurement of Multipath Interference in the Coherent Crosstalk Regime. IEEE Photonics Technology Letters. 2003.
  105. From Squiggle to Basepair: Computational Approaches for Improving Nanopore Sequencing Read Accuracy. Genome Biology. 2018.
  106. DeepScaleTool: A Tool for the Accurate Estimation of Technology Scaling in the Deep-Submicron Era. In ISCAS. 2021.
  107. Nanopore Sequencing Technology and Tools for Genome Assembly: Computational Analysis of the Current State, Bottlenecks and Future Directions. Briefings in Bioinformatics. 2019.
  108. Fast Bulk Bitwise AND and OR in DRAM. CAL. 2015.
  109. RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization. In MICRO. 2013.
  110. Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology. In MICRO. 2017.
  111. ISAAC: A Convolutional Neural Network Accelerator With In-Situ Analog Arithmetic in Crossbars. ISCA. 2016.
  112. Lightspeed Binary Neural Networks Using Optical Phase-Change Materials. In DATE. 2023a.
  113. RattlesnakeJake: A Fast and Accurate Pre-Alignment Filter Suitable for Computation-In-Memory. In SAMOS. 2023b.
  114. SieveMem: A Computation-In-Memory Architecture for Fast and Accurate Pre-Alignment. In ASAP. 2023c.
  115. A Case for Genome Analysis Where Genomes Reside. In SAMOS. 2023d.
  116. Demeter: A Fast and Energy-Efficient Food Profiler Using Hyperdimensional Computing in Memory. IEEE Access. 2022a.
  117. KrakenOnMem: A Memristor-Augmented HW/SW Framework for Taxonomic Profiling. In ICS. 2022b.
  118. Research Progress on Solutions to the Sneak Path Issue in Memristor Crossbar Arrays. Nanoscale Advances. 2020.
  119. CIM-Based Robust Logic Accelerator Using 28 nm STT-MRAM Characterization Chip Tape-Out. In AICAS. 2022b.
  120. A Framework for Designing Efficient Deep Learning-Based Genomic Basecallers. arXiv. 2022a.
  121. The Missing Memristor Found. Nature. 2008.
  122. Synopsys, Inc. Synopsys Design Compiler. https://www.synopsys.com/support/training/rtl-synthesis/design-compiler-rtl-synthesis.html.
  123. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In AAAI. 2017.
  124. Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings for Resilient Deep Learning Inference. In DAC. 2020.
  125. Hardware-Software Codesign of Accurate, Multiplier-Free Deep Neural Networks. In DAC. 2017.
  126. QUEST: A 7.49 TOPS Multi-Purpose Log-Quantized DNN Inference Engine Stacked on 96MB 3D SRAM Using Inductive-Coupling Technology in 40nm CMOS. In ISSCC. 2018.
  127. The Third Revolution in Sequencing Technology. Trends in Genetics. 2018.
  128. AxNN: Energy-Efficient Neuromorphic Systems Using Approximate Computing. In ISLPED. 2014.
  129. Crosstalk in VLSI Interconnections. TCAD. 1999.
  130. Crosstalk Reduction for VLSI. TCAD. 1997.
  131. Nanopore Sequencing Technology, Bioinformatics and Applications. Nature Biotechnology. 2021.
  132. Redox-Based Resistive Switching Memories–Nanoionic Mechanisms, Prospects, and Challenges. Advanced Materials. 2009.
  133. Comprehensive Comparison of Pacific Biosciences and Oxford Nanopore Technologies and Their Applications to Transcriptome Analysis. F1000Research. 2017.
  134. Performance of Neural Network Basecalling Tools for Oxford Nanopore Sequencing. Genome Biology. 2019.
  135. Wick, Ryan. Raw FAST5s. https://bridges.monash.edu/articles/dataset/Raw_fast5s/7676174.
  136. Wick, Ryan. Reference Genomes. https://bridges.monash.edu/articles/dataset/Reference_genomes/7676135.
  137. A Primer on Metagenomics. PLoS Computational Biology. 2010.
  138. Memristive Crossbar Arrays for Brain-Inspired Computing. Nature Materials. 2019.
  139. Scouting Logic: A Novel Memristor-Based Logic Design for Resistive Computing. In ISVLSI. 2017.
  140. Fast-Bonito: A Faster Deep Learning Based Basecaller for Nanopore Sequencing. Artificial Intelligence in the Life Sciences. 2021.
  141. Charlene Yang. Hierarchical Roofline Analysis: How to Collect Data Using Performance Tools on Intel CPUs and NVIDIA GPUs. arXiv. 2020.
  142. Multiplexed Detection of SARS-CoV-2 and Other Respiratory Infections in High Throughput by SARSeq. Nature Communications. 2021.
  143. SparseMEM: Energy-Efficient Design for In-Memory Sparse-Based Graph Processing. In DATE. 2023a.
  144. System Design for Computation-In-Memory: From Primitive to Complex Functions. In VLSI-SoC. 2022.
  145. Efficient Signed Arithmetic Multiplication on Memristor-Based Crossbar. IEEE Access. 2023b.
  146. DeepRoad: GAN-based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 2018.
  147. Design Guidelines of RRAM Based Neural-Processing-Unit: A Joint Device-Circuit-Algorithm Analysis. In DAC. 2019.
  148. STT-RAM Cell Optimization Considering MTJ and CMOS Variations. IEEE Transactions on Magnetics. 2011.
  149. Nanopore Basecalling From a Perspective of Instance Segmentation. BMC Bioinformatics. 2020.
Citations (7)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.