Papers
Topics
Authors
Recent
Search
2000 character limit reached

depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers

Published 14 Mar 2024 in cs.LG, cs.AI, and cs.PL | (2403.13839v1)

Abstract: PyTorch \texttt{2.x} introduces a compiler designed to accelerate deep learning programs. However, for machine learning researchers, adapting to the PyTorch compiler to full potential can be challenging. The compiler operates at the Python bytecode level, making it appear as an opaque box. To address this, we introduce \texttt{depyf}, a tool designed to demystify the inner workings of the PyTorch compiler. \texttt{depyf} decompiles bytecode generated by PyTorch back into equivalent source code, and establishes connections between in-memory code objects and their on-disk source code counterparts. This feature enables users to step through the source code line by line using debuggers, thus enhancing their understanding of the underlying processes. Notably, \texttt{depyf} is non-intrusive and user-friendly, primarily relying on two convenient context managers for its core functionality. The project is \href{https://github.com/thuml/depyf}{ openly available} and is recognized as a \href{https://pytorch.org/ecosystem/}{PyTorch ecosystem project}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (109)
  1. Unified pre-training for program understanding and generation, 2021.
  2. Beit: Bert pre-training of image transformers, 2022.
  3. Longformer: The long-document transformer, 2020.
  4. Yolov4: Optimal speed and accuracy of object detection, 2020.
  5. High-performance large-scale image recognition without normalization, 2021.
  6. Crossvit: Cross-attention multi-scale vision transformer for image classification, 2021a.
  7. Dual path networks, 2017.
  8. Visformer: The vision-friendly transformer, 2021b.
  9. Twins: Revisiting the design of spatial attention in vision transformers, 2021.
  10. Electra: Pre-training text encoders as discriminators rather than generators, 2020.
  11. TorchBench: A collection of open source benchmarks for PyTorch performance and usability evaluation, September 2020. URL https://github.com/pytorch/benchmark.
  12. Pp-lcnet: A lightweight cpu convolutional neural network, 2021.
  13. Flashattention: Fast and memory-efficient exact attention with io-awareness. In NeurIPS, 2022.
  14. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL, 2019.
  15. Repvgg: Making vgg-style convnets great again, 2021.
  16. An image is worth 16x16 words: Transformers for image recognition at scale, 2021a.
  17. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR, 2021b.
  18. Convit: improving vision transformers with soft convolutional inductive biases*. Journal of Statistical Mechanics: Theory and Experiment, 2022(11):114005, November 2022. ISSN 1742-5468. doi: 10.1088/1742-5468/ac9830. URL http://dx.doi.org/10.1088/1742-5468/ac9830.
  19. Xcit: Cross-covariance image transformers, 2021.
  20. Beyond english-centric multilingual machine translation, 2020.
  21. The pile: An 800gb dataset of diverse text for language modeling, 2020.
  22. Res2net: A new multi-scale backbone architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2):652–662, February 2021. ISSN 1939-3539. doi: 10.1109/tpami.2019.2938758. URL http://dx.doi.org/10.1109/TPAMI.2019.2938758.
  23. Levit: a vision transformer in convnet’s clothing for faster inference, 2021.
  24. Rethinking channel dimensions for efficient model design, 2021a.
  25. Ghostnet: More features from cheap operations, 2020a.
  26. Model rubik’s cube: Twisting resolution, depth and width for tinynets, 2020b.
  27. Transformer in transformer, 2021b.
  28. Deep residual learning for image recognition. In CVPR, 2016.
  29. Deberta: Decoding-enhanced bert with disentangled attention, 2021.
  30. Rethinking spatial dimensions of vision transformers, 2021.
  31. Searching for mobilenetv3, 2019.
  32. Densely connected convolutional networks, 2018.
  33. Learning to paint with model-based deep reinforcement learning, 2019.
  34. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and ¡0.5mb model size, 2016.
  35. Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In CVPR, 2018.
  36. Convbert: Improving bert with span-based dynamic convolution, 2021.
  37. A domain-specific supercomputer for training deep neural networks. Communications of the ACM, 2020.
  38. Segment Anything. 2023.
  39. Imagenet classification with deep convolutional neural networks. Commun. ACM, 2017.
  40. Training deep autoencoders for collaborative filtering, 2017.
  41. Adversarial attacks and defences competition, 2018.
  42. Albert: A lite bert for self-supervised learning of language representations, 2020.
  43. Yann LeCun. Deep Learning Hardware: Past, Present, and Future. In ISSCC, 2019.
  44. Centermask : Real-time anchor-free instance segmentation, 2020.
  45. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, 2019.
  46. Trocr: Transformer-based optical character recognition with pre-trained models, 2022.
  47. The deep learning compiler: A comprehensive survey. IEEE Transactions on Parallel and Distributed Systems, 2020.
  48. Neural architecture design for gpu-efficient networks, 2020a.
  49. A semi-supervised learning approach with two teachers to improve breakdown identification in dialogues, 2022.
  50. Real-time high-resolution background matting, 2020b.
  51. Few-shot learning with multilingual language models, 2022.
  52. Progressive neural architecture search, 2018.
  53. Pay attention to mlps, 2021a.
  54. Roberta: A robustly optimized bert pretraining approach, 2019.
  55. Multilingual denoising pre-training for neural machine translation, 2020.
  56. Swin transformer: Hierarchical vision transformer using shifted windows, 2021b.
  57. A convnet for the 2020s, 2022.
  58. Nvidia tensor core programmability, performance & precision. In 2018 IEEE international parallel and distributed processing symposium workshops (IPDPSW), 2018.
  59. Camembert: a tasty french language model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020.acl-main.645. URL http://dx.doi.org/10.18653/v1/2020.acl-main.645.
  60. Xnect: real-time multi-person 3d motion capture with a single rgb camera. ACM Transactions on Graphics, 39(4), August 2020. ISSN 1557-7368. doi: 10.1145/3386569.3392410. URL http://dx.doi.org/10.1145/3386569.3392410.
  61. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer, 2022.
  62. Convmixer: Feature interactive convolution with curriculum learning for small footprint and noisy far-field keyword spotting. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, May 2022. doi: 10.1109/icassp43922.2022.9747025. URL http://dx.doi.org/10.1109/ICASSP43922.2022.9747025.
  63. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS, 2019.
  64. Unsupervised representation learning with deep convolutional generative adversarial networks, 2016.
  65. Language models are unsupervised multitask learners, 2019.
  66. Robust speech recognition via large-scale weak supervision, 2022.
  67. Designing network design spaces, 2020.
  68. Exploring the limits of transfer learning with a unified text-to-text transformer, 2023.
  69. DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. In KDD, 2020.
  70. U-net: Convolutional networks for biomedical image segmentation, 2015.
  71. Mobilenetv2: Inverted residuals and linear bottlenecks, 2019.
  72. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, 2020.
  73. ChatGPT: Optimizing language models for dialogue. OpenAI blog, 2022.
  74. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism, 2020.
  75. Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage, 2022.
  76. Very deep convolutional networks for large-scale image recognition, 2015.
  77. Bottleneck transformers for visual recognition, 2021.
  78. Single-path nas: Designing hardware-efficient convnets in less than 4 hours, 2019.
  79. Mobilebert: a compact task-agnostic bert for resource-limited devices, 2020.
  80. Rethinking the inception architecture for computer vision, 2015.
  81. Mixconv: Mixed depthwise convolutional kernels, 2019.
  82. Efficientnet: Rethinking model scaling for convolutional neural networks, 2020.
  83. Mnasnet: Platform-aware neural architecture search for mobile, 2019.
  84. Mlp-mixer: An all-mlp architecture for vision, 2021.
  85. Resmlp: Feedforward networks for image classification with data-efficient training, 2021a.
  86. Training data-efficient image transformers & distillation through attention, 2021b.
  87. Going deeper with image transformers, 2021c.
  88. LLaMA: Open and Efficient Foundation Language Models, 2023.
  89. Scaling local self-attention for parameter efficient visual backbones, 2021.
  90. Cspnet: A new backbone that can enhance learning capability of cnn, 2019.
  91. Deep high-resolution representation learning for visual recognition, 2020.
  92. Ross Wightman. PyTorch Image Models, 2023.
  93. Resnet strikes back: An improved training procedure in timm, 2021.
  94. Transformers: State-of-the-art natural language processing. In EMNLP, 2020.
  95. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search, 2019.
  96. Self-training with noisy student improves imagenet classification, 2020.
  97. Aggregated residual transformations for deep neural networks, 2017.
  98. Co-scale conv-attentional image transformers, 2021.
  99. Layoutlm: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20. ACM, August 2020. doi: 10.1145/3394486.3403172. URL http://dx.doi.org/10.1145/3394486.3403172.
  100. mt5: A massively multilingual pre-trained text-to-text transformer, 2021.
  101. Xlnet: Generalized autoregressive pretraining for language understanding, 2020.
  102. Deep layer aggregation, 2019.
  103. Metaformer is actually what you need for vision, 2022.
  104. Volo: Vision outlooker for visual recognition, 2021.
  105. Resnest: Split-attention networks, 2020a.
  106. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization, 2020b.
  107. Opt: Open pre-trained transformer language models, 2022.
  108. Shufflenet: An extremely efficient convolutional neural network for mobile devices, 2017.
  109. Nested hierarchical transformer: Towards accurate, data-efficient and interpretable visual understanding, 2021.

Summary

  • The paper presents depyf, which decompiles opaque PyTorch bytecode into interpretable source code, enhancing compiler transparency.
  • It employs symbolic execution across nearly 200 bytecode types and function execution hijacking to enable detailed, step-by-step debugging.
  • Extensive experiments demonstrate that depyf achieves full compatibility across Python and PyTorch versions, streamlining compiler analysis and optimization.

An Academic Perspective on "depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers"

The advent of PyTorch 2.x, featuring a compiler aimed at accelerating deep learning frameworks, presents a potent tool for machine learning practitioners. However, the opacity of the compiler's operations at the Python bytecode level often poses significant challenges. Addressing this, the authors introduce "depyf," a tool that decompiles PyTorch-generated bytecode back to source code, enhancing comprehensibility for researchers.

Challenges and Context

Understanding the PyTorch compiler's intricacies is particularly daunting due to its complex frontend, Dynamo, which diverges user code into Python and PyTorch-specific segments. This segmentation, managed at the bytecode level, constructs an abstraction difficult to penetrate, especially for those lacking bytecode fluency. Additionally, the backend further complicates comprehension by optimizing computation graphs into executables suited for various hardware, limiting the use of debuggers.

depyf: Features and Implementation

The primary contribution of depyf is its capability to decompile bytecode into an analogue of the original source code. By focusing on approximately two hundred types of bytecode through symbolic execution, depyf offers a reliable solution across all Python versions supported by PyTorch. The tool achieves this by replicating PyTorch's core mechanisms within a Python framework, which elucidates the underlying processes typically obfuscated by the C-based implementation.

Moreover, the tool facilitates "function execution hijacking." This novel approach modifies critical function calls to enable introspection using standard debugging techniques, thus permitting a line-by-line traversal of the computation graph.

Experimental Validation and Results

Extensive experimentation underscores depyf's reliability and utility. As per Table~\ref{tab:decompiler_compare}, depyf outperforms existing decompilers, achieving complete compatibility across all tested Python and PyTorch versions. This unmatched correctness not only emphasizes its robustness but also positions depyf as a valuable asset for researchers aiming to explore deep learning compiler optimizations. The continuous integration testing strategy further cements depyf's place in ensuring stability against evolving software versions.

Implications and Future Directions

From a practical standpoint, depyf significantly lowers the barrier for machine learning researchers to leverage hardware optimizations without intricate hardware-specific knowledge. This facilitation allows researchers to focus on algorithmic innovations without being encumbered by operational complexities. Theoretically, depyf's ability to render PyTorch operations transparent fosters a deeper understanding of compilation processes, potentially guiding further enhancements in compiler designs.

Looking forward, depyf may inspire new avenues in AI research, particularly in optimizing model training and inference for emerging hardware architectures. By demystifying compiler processes, depyf sets a foundation for crafting more intuitive and efficient machine learning tools.

In summary, depyf emerges as a critical bridge between machine learning research and hardware optimization, enabling profound insights and simplified access to PyTorch's computational advantages. Its contribution to compiler transparency not only benefits current research paradigms but also lays the groundwork for future advancements in machine learning infrastructure.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 6 likes about this paper.