Power Hungry Processing: Watts Driving the Cost of AI Deployment?
Abstract: Recent years have seen a surge in the popularity of commercial AI products based on generative, multi-purpose AI systems promising a unified approach to building ML models into technology. However, this ambition of `generality'' comes at a steep cost to the environment, given the amount of energy these systems require and the amount of carbon that they emit. In this work, we propose the first systematic comparison of the ongoing inference cost of various categories of ML systems, covering both task-specific (i.e. finetuned models that carry out a single task) andgeneral-purpose' models, (i.e. those trained for multiple tasks). We measure deployment cost as the amount of energy and carbon required to perform 1,000 inferences on representative benchmark dataset using these models. We find that multi-purpose, generative architectures are orders of magnitude more expensive than task-specific systems for a variety of tasks, even when controlling for the number of model parameters. We conclude with a discussion around the current trend of deploying multi-purpose generative ML systems, and caution that their utility should be more intentionally weighed against increased costs in terms of energy and emissions. All the data from our study can be accessed via an interactive demo to carry out further exploration and analysis.
- Evaluating the carbon footprint of NLP methods: a survey and analysis of existing tools. In EMNLP, Workshop SustaiNLP.
- Jeff Barr. 2019. Amazon ec2 update–inf1 instances with AWS inferentia chips for high performance cost-effective inferencing. https://aws.amazon.com/blogs/aws/amazon-ec2-update-inf1-instances-with-aws-inferentia-chips-for-high-performance-cost-effective-inferencing/.
- Bing. 2019. Bing delivers its largest improvement in search experience using Azure GPUs. https://azure.microsoft.com/en-us/blog/bing-delivers-its-largest-improvement-in-search-experience-using-azure-gpus/.
- Bing. 2023. Confirmed: the new Bing runs on OpenAI’s GPT-4. https://blogs.bing.com/search/march_2023/Confirmed-the-new-Bing-runs-on-OpenAI%E2%80%99s-GPT-4.
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- Reducing the Carbon Impact of Generative AI Inference (today and in 2035). In Proceedings of the 2nd Workshop on Sustainable Computer Systems. 1–7.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
- Scaling Instruction-Finetuned Language Models. https://doi.org/10.48550/ARXIV.2210.11416
- Rishit Dagli and Ali Mustufa Shaikh. 2021. CPPE-5: Medical Personal Protective Equipment Dataset. arXiv:2112.09569 [cs.CV]
- RedCaps: web-curated image-text data created by the people, for the people. arXiv:2111.11431 [cs.CV]
- Compute and energy consumption trends in deep learning inference. arXiv preprint arXiv:2109.05472 (2021).
- Measuring the carbon intensity of AI in cloud instances. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1877–1894.
- LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models. arXiv preprint arXiv:2309.14393 (2023).
- A framework for few-shot language model evaluation. https://doi.org/10.5281/zenodo.5371628
- SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization. Association for Computational Linguistics, Hong Kong, China, 70–79. https://doi.org/10.18653/v1/D19-5409
- Google. 2019. Understanding searches better than ever before. https://blog.google/products/search/search-language-understanding-bert/.
- Google. 2023a. Bard can now connect to your Google apps and services. https://blog.google/products/bard/google-bard-new-features-update-sept-2023/.
- Google. 2023b. An important next step on our AI journey. https://blog.google/technology/ai/bard-google-ai-search-updates/.
- CarbonScaler: Leveraging Cloud Workload Elasticity for Optimizing Carbon-Efficiency. arXiv preprint arXiv:2302.08681 (2023).
- Teaching Machines to Read and Comprehend. In NeurIPS. 1693–1701. http://papers.nips.cc/paper/5945-teaching-machines-to-read-and-comprehend
- Ralph Hintemann and Simon Hinterholzer. 2022. Cloud computing drives the growth of the data center industry and its energy consumption. Data centers 2022. ResearchGate (2022).
- International Energy Authority. 2023. Data Centres and Data Transmission Networks. https://www.iea.org/energy-system/buildings/data-centres-and-data-transmission-networks.
- Matt Gardner Johannes Welbl, Nelson F. Liu. 2017. Crowdsourcing Multiple Choice Science Questions. arXiv:1707.06209v1.
- Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. International Journal of Computer Vision 123 (2017), 32–73. https://doi.org/10.1007/s11263-016-0981-7
- Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. Technical Report.
- Quantifying the carbon emissions of machine learning. arXiv preprint arXiv:1910.09700 (2019).
- A Holistic Assessment of the Carbon Footprint of Noor, a Very Large Arabic Language Model. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Association for Computational Linguistics, virtual+Dublin, 84–94. https://doi.org/10.18653/v1/2022.bigscience-1.8
- George Leopold. 2019. Aws to offer nvidia’s t4 gpus for ai inferencing. URL: https://web. archive. org/web/20220309000921/https://www. hpcwire. com/2019/03/19/aws-upgrades-its-gpu-backed-ai-inference-platform/(visited on 2022-04-19) (2019).
- Microsoft COCO: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740–755.
- Alexandra Sasha Luccioni and Alex Hernandez-Garcia. 2023. Counting carbon: A survey of factors influencing the emissions of machine learning. arXiv preprint arXiv:2302.08476 (2023).
- Estimating the carbon footprint of BLOOM, a 176B parameter language model. arXiv preprint arXiv:2211.02001 (2022).
- Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, 142–150. http://www.aclweb.org/anthology/P11-1015
- Pointer Sentinel Mixture Models. arXiv:1609.07843 [cs.CL]
- Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786 (2022).
- Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. ArXiv abs/1808.08745 (2018).
- Will Oremus. 2023. AI chatbots lose money every time you use them. That is a problem. https://www.washingtonpost.com/technology/2023/06/05/chatgpt-hidden-cost-gpu-compute/. Washington Post (2023).
- Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures (Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July 2019), Piotr Bański, Adrien Barbaresi, Hanno Biber, Evelyn Breiteneder, Simon Clematide, Marc Kupietz, Harald L”ungen, and Caroline Iliadi (Eds.). Leibniz-Institut f”ur Deutsche Sprache, Mannheim, 9 – 16. https://doi.org/10.14618/ids-pub-9021
- Cross-lingual Name Tagging and Linking for 282 Languages. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1946–1958. https://doi.org/10.18653/v1/P17-1178
- Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL.
- The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink. https://doi.org/10.48550/ARXIV.2204.05149
- Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350 (2021).
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv e-prints (2019). arXiv:1910.10683
- Know What You Don’t Know: Unanswerable Questions for SQuAD. arXiv:1806.03822 [cs.CL]
- SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv e-prints, Article arXiv:1606.05250 (2016), arXiv:1606.05250 pages. arXiv:1606.05250
- ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y
- Gustavo Santana. 2023. Stable Diffusion Prompts. https://huggingface.co/datasets/Gustavosta/Stable-Diffusion-Prompts
- CodeCarbon: Estimate and Track Carbon Emissions from Machine Learning Computing.
- Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA, 1631–1642. https://www.aclweb.org/anthology/D13-1170
- Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243 (2019).
- Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003. 142–147. https://www.aclweb.org/anthology/W03-0419
- Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurement. arXiv preprint arXiv:2210.01970 (2022).
- SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. arXiv preprint arXiv:1905.00537 (2019).
- DiffusionDB: A Large-Scale Prompt Gallery Dataset for Text-to-Image Generative Models. arXiv:2210.14896 [cs] (2022). https://arxiv.org/abs/2210.14896
- Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).
- BLOOM: A 176B-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 (2022).
- Sustainable AI: Environmental Implications, Challenges and Opportunities. arXiv preprint arXiv:2111.00364 (2021).
- ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation. arXiv:2304.05977 [cs.CV]
- Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. In The IEEE International Conference on Computer Vision (ICCV).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.