Three Decades of Activations: A Comprehensive Survey of 400 Activation Functions for Neural Networks

Published 14 Feb 2024 in cs.LG and cs.NE | (2402.09092v1)

Abstract: Neural networks have proven to be a highly effective tool for solving complex problems in many areas of life. Recently, their importance and practical usability have further been reinforced with the advent of deep learning. One of the important conditions for the success of neural networks is the choice of an appropriate activation function introducing non-linearity into the model. Many types of these functions have been proposed in the literature in the past, but there is no single comprehensive source containing their exhaustive overview. The absence of this overview, even in our experience, leads to redundancy and the unintentional rediscovery of already existing activation functions. To bridge this gap, our paper presents an extensive survey involving 400 activation functions, which is several times larger in scale than previous surveys. Our comprehensive compilation also references these surveys; however, its main goal is to provide the most comprehensive overview and systematization of previously published activation functions with links to their original sources. The secondary aim is to update the current understanding of this family of functions.

Abstract PDF Upgrade to Chat

Citations (13)

View on Semantic Scholar

Summary

The paper compiles 400 activation functions over 30 years to offer a consolidated reference for neural network design.
It categorizes functions into fixed and adaptive types, emphasizing design principles and performance trade-offs.
The survey mitigates redundant efforts by providing a comprehensive catalog that supports future activation function innovations.

A Survey of Neural Network Activation Functions

The paper "Three Decades of Activations: A Comprehensive Survey of 400 Activation Functions for Neural Networks" by Vladimír Kunc and Jiří Kléma presents an extensive compilation of activation functions (AFs) pivotal to the field of neural networks (NNs). This body of work, surveying over 400 activation functions, serves as a significant reference for researchers focusing on neural networks, featuring both classical and adaptive activation functions (AAFs).

Overview

The extensive review bridges an evident gap in the literature by aggregating a comprehensive list of activation functions which have been proposed over the last thirty years. The necessity of such a list is underscored by the frequent redundancy in research, where identical or similar activation functions are often independently rediscovered, leading to unnecessary duplication of effort. By offering a more consolidated resource, the authors aim to ameliorate this situation and aid research advancements in activation function design within the neural network community.

The paper partitions activation functions into two broad categories: fixed activation functions and adaptive activation functions. Fixed activation functions are prevalent in neural network layers, adding non-linearity without trainable parameters. Examples include the commonly used ReLU series (including the standard ReLU, Leaky ReLU, and Bounded ReLU), as well as sigmoid-based functions such as the logistic sigmoid and hyperbolic tangent.

Conversely, adaptive activation functions incorporate tunable parameters, which can be adjusted as part of the learning process. Functions like PReLU, Swish, and the generalized transformative adaptive activation function (TAAF) fall into this category. These functions provide flexible modeling capabilities, allowing neural networks to better adapt to complex data patterns.

Key Highlights and Numerical Results

While the paper focuses primarily on cataloging the breadth of available functions, it also touches upon the different criteria that influence the utility and efficiency of activation functions. These criteria include but are not limited to the function's capacity to introduce non-linear curvature, its computational cost, and the gradient flow capabilities it enables during model training.

The paper references existing surveys, notably the works by Dubey et al. and Apicella et al., expanding their function listings significantly but does not primarily aim at empirical benchmarking. Still, it acknowledges some recent empirical studies that have investigated activation functions' performance across various tasks and architectures, highlighting several which outperform vanilla ReLU in specific circumstances.

Practical and Theoretical Implications

Practically, this survey empowers researchers by providing a ready reference of previously developed activation functions. This can significantly reduce redundant reinventions and encourage the proposal of novel activation mechanisms tailored to emerging challenges in neural networks. The theoretical contribution lies in organizing activation functions into systematic categories and elucidating essential design principles behind adaptive functions.

Speculation on Future Developments

As AI continues to evolve, there is a strong possibility that the demand for specialized activation functions will grow, particularly for domain-specific applications in areas such as computer vision, speech recognition, and bioinformatics. This paper could serve as a foundational reference for future work aiming at further unifying activation function theory, possibly leading to the design of universal activation functions that can be tuned to perform optimally across varied tasks.

Furthermore, with the growing emphasis on explainability and model interpretability, adaptive activation functions may play an essential role in achieving such ends, due to their flexibility and ability to imbue models with complex decision boundaries.

Conclusion

In summary, "Three Decades of Activations" is a detailed survey that collects, organizes, and offers insights into a wide array of neural network activation functions. By documenting these functions comprehensively, Vladimír Kunc and Jiří Kléma provide a pivotal resource for researchers, spurring further advancements in activation function research and neural network design.

Markdown Report Issue