Fast Adaptation with Linearized Neural Networks

Published 2 Mar 2021 in stat.ML and cs.LG | (2103.01439v2)

Abstract: The inductive biases of trained neural networks are difficult to understand and, consequently, to adapt to new settings. We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions. Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network. In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation. This inference is analytic and free of local optima issues found in standard techniques such as fine-tuning neural network weights to a new task. We develop significant computational speed-ups based on matrix multiplies, including a novel implementation for scalable Fisher vector products. Our experiments on both image classification and regression demonstrate the promise and convenience of this framework for transfer learning, compared to neural network fine-tuning. Code is available at https://github.com/amzn/xfer/tree/master/finite_ntk.

Abstract PDF Upgrade to Chat

Citations (32)

View on Semantic Scholar

Summary

The paper introduces a method that linearizes deep neural networks using a first-order Taylor expansion to capture key inductive biases.
The paper embeds these linear models into a Gaussian process framework with the Jacobian serving as the kernel for uncertainty quantification.
The paper shows that this approach achieves competitive performance in transfer learning tasks while reducing computational overhead.

Fast Adaptation with Linearized Neural Networks

The paper "Fast Adaptation with Linearized Neural Networks" proposes a novel approach to transfer learning using a linearization of deep neural networks (DNNs). The authors explore the benefits of using the linearization of network functions as a means to efficiently adapt models to new tasks. This technique leverages the inductive biases of DNNs by embedding them into Gaussian processes (GPs) with kernels derived from the Jacobian matrix of the network. By doing so, the method tackles common issues in transfer learning such as computational overhead and the risk of getting trapped in local optima during network adaptation.

Key Contributions

Linearization of Neural Networks: The approach begins by performing a first-order Taylor expansion of the DNN, which effectively linearizes the network around its set of trained weights. This linear model captures the relevant inductive biases of the original network.
Gaussian Process Embedding: These linear models are then embedded within a GP framework. Specifically, the Jacobian of the network serves as the kernel of the GP, allowing for probabilistic inference with an accompanying representation of uncertainty in predictions.
Scalability Enhancements: The authors introduce computational optimizations such as scalable Fisher vector products and implicit Jacobian computations to handle larger datasets effectively. These contribute to making the method practically feasible for significant domain adaptation tasks.
Inductive Bias Analysis: Empirical analysis is conducted to demonstrate that the inductive biases inherent in DNNs are well-preserved by these linearized models. The study shows the transferability and interpretability advantages, which supplement theoretical guarantees offered by the GP's Bayesian framework.
Application to Various Tasks: The experimental results show that, for tasks like image classification and regression, the linearized approach can match or exceed the performance of fine-tuning techniques. Moreover, the method is competitive with existing state-of-the-art kernel approaches in terms of classification accuracy and generalization performance.

Implications and Future Directions

The proposed methodology offers significant implications for the field of AI, particularly in the context of transfer learning. It provides a new lens through which to view network adaptability by leveraging the power of function space modeling and probabilistic inference. The insights on inductive biases offered by this approach could inspire more interpretable models and contribute to the development of more efficient learning algorithms.

From a practical standpoint, the approach reduces computational burden and improves adaptation efficiency, which is crucial for real-time and large-scale applications. The scalability solutions presented could further be generalized to other areas of machine learning where working with large feature sets is a bottleneck.

Future developments could explore the extension of this framework to more complex neural architectures and different types of data beyond the typical regression and classification scenarios. The combination of linearized models with advanced variational inference techniques could also be a promising direction to improve the approximation of non-Gaussian likelihoods, thus broadening the applicability of the approach to various other domains such as reinforcement learning.

In summary, the paper offers a compelling case for the use of linearized neural networks in transfer learning scenarios, highlighted by strong empirical results and significant computational efficiencies.