Compute and Energy Consumption Trends in Deep Learning Inference

Published 12 Sep 2021 in cs.LG and cs.AI | (2109.05472v2)

Abstract: The progress of some AI paradigms such as deep learning is said to be linked to an exponential growth in the number of parameters. There are many studies corroborating these trends, but does this translate into an exponential increase in energy consumption? In order to answer this question we focus on inference costs rather than training costs, as the former account for most of the computing effort, solely because of the multiplicative factors. Also, apart from algorithmic innovations, we account for more specific and powerful hardware (leading to higher FLOPS) that is usually accompanied with important energy efficiency optimisations. We also move the focus from the first implementation of a breakthrough paper towards the consolidated version of the techniques one or two year later. Under this distinctive and comprehensive perspective, we study relevant models in the areas of computer vision and natural language processing: for a sustained increase in performance we see a much softer growth in energy consumption than previously anticipated. The only caveat is, yet again, the multiplicative factor, as future AI increases penetration and becomes more pervasive.

Abstract PDF Upgrade to Chat

Citations (98)

View on Semantic Scholar

Summary

The paper reveals that deep learning inference, which accounts for about 90% of compute costs, improves energy efficiency through specialized GPUs like Tesla V100, A100, and T4.
The paper shows that algorithmic innovations, including EfficientNet, enable high accuracy with fewer resources, reducing energy consumption significantly.
The paper emphasizes that sustainable AI development requires balancing compute scaling with energy efficiency to meet the growing demands of global inference applications.

Overview of Compute and Energy Consumption Trends in Deep Learning Inference

The paper "Compute and Energy Consumption Trends in Deep Learning Inference" by Radosvet Desislavov, Fernando Martínez-Plumed, and José Hernández-Orallo presents a comprehensive analysis focusing on the inference costs associated with deep learning models. This paper differentiates itself by emphasizing inference rather than training costs since inference is responsible for a far larger share of computational effort due to its repetitive nature after initial model deployment.

Key Findings and Implications

The study centers on two primary domains: Computer Vision (CV) and NLP. Key observations indicate that, although the demand for computational resources during training has been thoroughly explored, inference—which constitutes about 90% of the computational costs—is less documented. This work provides critical insights into the trends of energy consumption and its implications, presenting the following findings:

Scaling of Compute and Efficiency Improvements: The research confirms that deep learning models, specifically in CV and NLP, have shown an exponential growth in the number of parameters over the years. However, this does not directly equate to an exponential increase in energy consumption. The advancements in hardware, particularly the development of specialized graphics processing units (GPUs) with mixed precision capabilities, have significantly enhanced compute efficiency. Notably, the Tesla V100, A100, and T4 GPUs from Nvidia exhibit notable improvements in FLOPS per Watt, showcasing enhanced performance and energy efficiency.
Algorithmic Improvements Over Raw Compute Scaling: Algorithmic innovations have played a pivotal role in improving model performance without proportionately increasing energy usage. EfficientNet and its variants, for example, demonstrate that with the right architectural choices, models can achieve high accuracy with relatively fewer computational resources compared to their predecessors. This indicates that advancements in algorithms contribute substantially to the performance gains seen in modern neural network architectures.
Inference Energy Consumption Trends: By focusing on inference rather than training, the study illustrates how energy consumption and computational efficiency trends are evolving. While cutting-edge models still show exponential growth in their compute demands, the energy consumption for models used in mainstream applications (those that integrate efficiency optimizations) demonstrates a much more moderated trend.
Multiplicative Factor and Future Projections: The increasing ubiquity of AI applications suggests that the multiplicative factor (number of inferences per capita) could significantly escalate energy consumption despite efficiency improvements. This analysis highlights the necessity for sustainable approaches as AI becomes more integrally embedded in daily life.

Practical and Theoretical Implications

Practically, the research underscores the importance of optimizing inference efficiency to mitigate the exponential growth of energy consumption despite expanding AI applications. Theoretically, it prompts a re-evaluation of the sustainable scaling of AI models, suggesting a pivot towards maximizing algorithmic and architectural efficiency while enhancing hardware support.

Speculations on Future Developments

The field can expect to see further integration of specialized hardware to manage energy efficiency adequately. With a focus on inference, downstream implications include the need for economy-wide considerations where AI integration is expected to influence socio-economic structures, potentially impacting everything from energy policies to workforce adaptations.

In conclusion, "Compute and Energy Consumption Trends in Deep Learning Inference" contributes substantially to our understanding of energy dynamics in AI applications. By highlighting the nuanced relationship between algorithmic advancement, hardware specialization, and energy consumption, the research advocates for a balanced approach focusing on sustainable AI development. For future inquiries, the research opens avenues to explore socio-economic impacts and the potential for novel AI paradigms that endeavor for greater efficacy with minimized energy expenditures.