depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers
Abstract: PyTorch \texttt{2.x} introduces a compiler designed to accelerate deep learning programs. However, for machine learning researchers, adapting to the PyTorch compiler to full potential can be challenging. The compiler operates at the Python bytecode level, making it appear as an opaque box. To address this, we introduce \texttt{depyf}, a tool designed to demystify the inner workings of the PyTorch compiler. \texttt{depyf} decompiles bytecode generated by PyTorch back into equivalent source code, and establishes connections between in-memory code objects and their on-disk source code counterparts. This feature enables users to step through the source code line by line using debuggers, thus enhancing their understanding of the underlying processes. Notably, \texttt{depyf} is non-intrusive and user-friendly, primarily relying on two convenient context managers for its core functionality. The project is \href{https://github.com/thuml/depyf}{ openly available} and is recognized as a \href{https://pytorch.org/ecosystem/}{PyTorch ecosystem project}.
- Unified pre-training for program understanding and generation, 2021.
- Beit: Bert pre-training of image transformers, 2022.
- Longformer: The long-document transformer, 2020.
- Yolov4: Optimal speed and accuracy of object detection, 2020.
- High-performance large-scale image recognition without normalization, 2021.
- Crossvit: Cross-attention multi-scale vision transformer for image classification, 2021a.
- Dual path networks, 2017.
- Visformer: The vision-friendly transformer, 2021b.
- Twins: Revisiting the design of spatial attention in vision transformers, 2021.
- Electra: Pre-training text encoders as discriminators rather than generators, 2020.
- TorchBench: A collection of open source benchmarks for PyTorch performance and usability evaluation, September 2020. URL https://github.com/pytorch/benchmark.
- Pp-lcnet: A lightweight cpu convolutional neural network, 2021.
- Flashattention: Fast and memory-efficient exact attention with io-awareness. In NeurIPS, 2022.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL, 2019.
- Repvgg: Making vgg-style convnets great again, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale, 2021a.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR, 2021b.
- Convit: improving vision transformers with soft convolutional inductive biases*. Journal of Statistical Mechanics: Theory and Experiment, 2022(11):114005, November 2022. ISSN 1742-5468. doi: 10.1088/1742-5468/ac9830. URL http://dx.doi.org/10.1088/1742-5468/ac9830.
- Xcit: Cross-covariance image transformers, 2021.
- Beyond english-centric multilingual machine translation, 2020.
- The pile: An 800gb dataset of diverse text for language modeling, 2020.
- Res2net: A new multi-scale backbone architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2):652–662, February 2021. ISSN 1939-3539. doi: 10.1109/tpami.2019.2938758. URL http://dx.doi.org/10.1109/TPAMI.2019.2938758.
- Levit: a vision transformer in convnet’s clothing for faster inference, 2021.
- Rethinking channel dimensions for efficient model design, 2021a.
- Ghostnet: More features from cheap operations, 2020a.
- Model rubik’s cube: Twisting resolution, depth and width for tinynets, 2020b.
- Transformer in transformer, 2021b.
- Deep residual learning for image recognition. In CVPR, 2016.
- Deberta: Decoding-enhanced bert with disentangled attention, 2021.
- Rethinking spatial dimensions of vision transformers, 2021.
- Searching for mobilenetv3, 2019.
- Densely connected convolutional networks, 2018.
- Learning to paint with model-based deep reinforcement learning, 2019.
- Squeezenet: Alexnet-level accuracy with 50x fewer parameters and ¡0.5mb model size, 2016.
- Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In CVPR, 2018.
- Convbert: Improving bert with span-based dynamic convolution, 2021.
- A domain-specific supercomputer for training deep neural networks. Communications of the ACM, 2020.
- Segment Anything. 2023.
- Imagenet classification with deep convolutional neural networks. Commun. ACM, 2017.
- Training deep autoencoders for collaborative filtering, 2017.
- Adversarial attacks and defences competition, 2018.
- Albert: A lite bert for self-supervised learning of language representations, 2020.
- Yann LeCun. Deep Learning Hardware: Past, Present, and Future. In ISSCC, 2019.
- Centermask : Real-time anchor-free instance segmentation, 2020.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, 2019.
- Trocr: Transformer-based optical character recognition with pre-trained models, 2022.
- The deep learning compiler: A comprehensive survey. IEEE Transactions on Parallel and Distributed Systems, 2020.
- Neural architecture design for gpu-efficient networks, 2020a.
- A semi-supervised learning approach with two teachers to improve breakdown identification in dialogues, 2022.
- Real-time high-resolution background matting, 2020b.
- Few-shot learning with multilingual language models, 2022.
- Progressive neural architecture search, 2018.
- Pay attention to mlps, 2021a.
- Roberta: A robustly optimized bert pretraining approach, 2019.
- Multilingual denoising pre-training for neural machine translation, 2020.
- Swin transformer: Hierarchical vision transformer using shifted windows, 2021b.
- A convnet for the 2020s, 2022.
- Nvidia tensor core programmability, performance & precision. In 2018 IEEE international parallel and distributed processing symposium workshops (IPDPSW), 2018.
- Camembert: a tasty french language model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020.acl-main.645. URL http://dx.doi.org/10.18653/v1/2020.acl-main.645.
- Xnect: real-time multi-person 3d motion capture with a single rgb camera. ACM Transactions on Graphics, 39(4), August 2020. ISSN 1557-7368. doi: 10.1145/3386569.3392410. URL http://dx.doi.org/10.1145/3386569.3392410.
- Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer, 2022.
- Convmixer: Feature interactive convolution with curriculum learning for small footprint and noisy far-field keyword spotting. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, May 2022. doi: 10.1109/icassp43922.2022.9747025. URL http://dx.doi.org/10.1109/ICASSP43922.2022.9747025.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS, 2019.
- Unsupervised representation learning with deep convolutional generative adversarial networks, 2016.
- Language models are unsupervised multitask learners, 2019.
- Robust speech recognition via large-scale weak supervision, 2022.
- Designing network design spaces, 2020.
- Exploring the limits of transfer learning with a unified text-to-text transformer, 2023.
- DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. In KDD, 2020.
- U-net: Convolutional networks for biomedical image segmentation, 2015.
- Mobilenetv2: Inverted residuals and linear bottlenecks, 2019.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, 2020.
- ChatGPT: Optimizing language models for dialogue. OpenAI blog, 2022.
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism, 2020.
- Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage, 2022.
- Very deep convolutional networks for large-scale image recognition, 2015.
- Bottleneck transformers for visual recognition, 2021.
- Single-path nas: Designing hardware-efficient convnets in less than 4 hours, 2019.
- Mobilebert: a compact task-agnostic bert for resource-limited devices, 2020.
- Rethinking the inception architecture for computer vision, 2015.
- Mixconv: Mixed depthwise convolutional kernels, 2019.
- Efficientnet: Rethinking model scaling for convolutional neural networks, 2020.
- Mnasnet: Platform-aware neural architecture search for mobile, 2019.
- Mlp-mixer: An all-mlp architecture for vision, 2021.
- Resmlp: Feedforward networks for image classification with data-efficient training, 2021a.
- Training data-efficient image transformers & distillation through attention, 2021b.
- Going deeper with image transformers, 2021c.
- LLaMA: Open and Efficient Foundation Language Models, 2023.
- Scaling local self-attention for parameter efficient visual backbones, 2021.
- Cspnet: A new backbone that can enhance learning capability of cnn, 2019.
- Deep high-resolution representation learning for visual recognition, 2020.
- Ross Wightman. PyTorch Image Models, 2023.
- Resnet strikes back: An improved training procedure in timm, 2021.
- Transformers: State-of-the-art natural language processing. In EMNLP, 2020.
- Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search, 2019.
- Self-training with noisy student improves imagenet classification, 2020.
- Aggregated residual transformations for deep neural networks, 2017.
- Co-scale conv-attentional image transformers, 2021.
- Layoutlm: Pre-training of text and layout for document image understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20. ACM, August 2020. doi: 10.1145/3394486.3403172. URL http://dx.doi.org/10.1145/3394486.3403172.
- mt5: A massively multilingual pre-trained text-to-text transformer, 2021.
- Xlnet: Generalized autoregressive pretraining for language understanding, 2020.
- Deep layer aggregation, 2019.
- Metaformer is actually what you need for vision, 2022.
- Volo: Vision outlooker for visual recognition, 2021.
- Resnest: Split-attention networks, 2020a.
- Pegasus: Pre-training with extracted gap-sentences for abstractive summarization, 2020b.
- Opt: Open pre-trained transformer language models, 2022.
- Shufflenet: An extremely efficient convolutional neural network for mobile devices, 2017.
- Nested hierarchical transformer: Towards accurate, data-efficient and interpretable visual understanding, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.