A Survey on Dynamic Neural Networks: from Computer Vision to Multi-modal Sensor Fusion

Published 13 Jan 2025 in cs.CV | (2501.07451v1)

Abstract: Model compression is essential in the deployment of large Computer Vision models on embedded devices. However, static optimization techniques (e.g. pruning, quantization, etc.) neglect the fact that different inputs have different complexities, thus requiring different amount of computations. Dynamic Neural Networks allow to condition the number of computations to the specific input. The current literature on the topic is very extensive and fragmented. We present a comprehensive survey that synthesizes and unifies existing Dynamic Neural Networks research in the context of Computer Vision. Additionally, we provide a logical taxonomy based on which component of the network is adaptive: the output, the computation graph or the input. Furthermore, we argue that Dynamic Neural Networks are particularly beneficial in the context of Sensor Fusion for better adaptivity, noise reduction and information prioritization. We present preliminary works in this direction.

Abstract PDF Upgrade to Chat

Summary

The paper introduces dynamic neural networks that tailor computational efforts per input, achieving up to 50% faster inference in certain cases.
It categorizes adaptive models into early exits, computation routing, and token skimming, each optimizing performance and resource use.
The paper highlights practical benefits for edge devices and outlines future directions for refining adaptive architectures in diverse applications.

Dynamic Neural Networks (DNNs) are an emerging paradigm addressing the limitations of static model architectures by introducing input-dependent adaptivity. The surveyed paper explores the methodologies and efficacy of DNNs, specifically within the realms of Computer Vision (CV) and Sensor Fusion. Model compression remains a critical challenge in deploying large-scale CV models on devices with limited computational resources. Despite their utility, traditional model compression techniques, like quantization and pruning, do not account for the computational variability that diverse inputs demand. DNNs elegantly address this shortcoming by tailoring computational effort according to input complexity, promising efficiency gains in resource-constrained applications.

Dynamic Neural Network Taxonomy

The authors systematically categorize DNNs into three core groups:

Early Exits Networks: These models introduce auxiliary classifiers at intermediary network layers, allowing provision of predictions before reaching the final network output. They effectively minimize inference cost for simpler inputs by exiting early in the network’s computational graph. Pioneering models like BranchyNet illustrate the potential computational savings while mitigating overthinking in deeper network layers.
Computation Routing Networks: These solutions present architectures whereby specific network components (e.g., layers, channels, filters) are activated dynamically based on input data. Approaches leveraging Mixture-of-Experts (MoE), for instance, judiciously select specialized expert networks, dynamically adjusting the computational path to optimize both performance and efficiency. MoE architectures in ViTs (Vision Transformers) highlight the substantial flexibility and scalability potential of such systems.
Token Skimming Techniques: Predominantly applied to Transformer architectures, these methods optimize the input token processing by dropping or merging uninformative tokens. Techniques like Token Merging tweak the token transformations, deploying lightweight algorithms to reduce model computation without substantial performance penalties.

Strong Numerical Results and Claims

The paper underscores dynamic approach efficacy with compelling empirical results:

Early Exits: Demonstrated substantial reductions in computational overhead, ensuring predictions are made by exiting prematurely—up to a 50% reduction in inference time in certain networks—while maintaining accuracy margins over static counterparts.
MoE Techniques: Facilitated improvements by adaptively activating sub-network components, achieving state-of-the-art performance on multiple vision tasks with less computational demand than dense networks.
Token Skimming: Achieved promising throughput enhancements by gradually combining similar tokens, evidenced by competitive performance retention even as operational token counts significantly decreased.

Implications and Future Directions

The implications of these methodologies are far-reaching for both theoretical and practical applications. In theoretical contexts, DNNs offer insights into the input-specific computational demands within Artificial Intelligence, inciting further exploration into adaptive architecture designs. Practically, the advantages extend to enhancing edge device efficiency, potentially revolutionizing deployment in scenarios constrained by processing capabilities—like autonomous vehicles and mobile devices.

From the authors’ perspective, further developments could focus on refining the stability and predictability of computational savings in these models, facilitating seamless integration with existing frameworks and accelerating adoption across diverse domains. Additionally, advancing adaptive routing mechanisms might allow finer granularity in model scalability, underscoring modular design principles vital for progressive real-world applications.

In conclusion, DNNs promise to reshape the landscape of model efficiency by intelligently managing computational expenses in accordance with input specificity. This survey not only provides a lucid taxonomy and evaluation of DNN methodologies but also sets a roadmap for future innovations in adaptive neural network systems in computer vision and beyond.