Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data

Published 17 Dec 2019 in cs.LG and stat.ML | (1912.07768v1)

Abstract: This paper investigates the intriguing question of whether we can create learning algorithms that automatically generate training data, learning environments, and curricula in order to help AI agents rapidly learn. We show that such algorithms are possible via Generative Teaching Networks (GTNs), a general approach that is, in theory, applicable to supervised, unsupervised, and reinforcement learning, although our experiments only focus on the supervised case. GTNs are deep neural networks that generate data and/or training environments that a learner (e.g. a freshly initialized neural network) trains on for a few SGD steps before being tested on a target task. We then differentiate through the entire learning process via meta-gradients to update the GTN parameters to improve performance on the target task. GTNs have the beneficial property that they can theoretically generate any type of data or training environment, making their potential impact large. This paper introduces GTNs, discusses their potential, and showcases that they can substantially accelerate learning. We also demonstrate a practical and exciting application of GTNs: accelerating the evaluation of candidate architectures for neural architecture search (NAS), which is rate-limited by such evaluations, enabling massive speed-ups in NAS. GTN-NAS improves the NAS state of the art, finding higher performing architectures when controlling for the search proposal mechanism. GTN-NAS also is competitive with the overall state of the art approaches, which achieve top performance while using orders of magnitude less computation than typical NAS methods. Speculating forward, GTNs may represent a first step toward the ambitious goal of algorithms that generate their own training data and, in doing so, open a variety of interesting new research questions and directions.

Abstract PDF Upgrade to Chat

Citations (145)

View on Semantic Scholar

Summary

The paper introduces GTNs that leverage a dual-loop meta-learning approach to generate synthetic training data, significantly accelerating neural architecture search.
The paper demonstrates that GTN-trained networks achieve higher few-step accuracy on benchmarks like MNIST and CIFAR10 compared to conventional methods.
The paper highlights how synthetic data generation can reduce computational overhead in NAS while maintaining competitive model performance.

Generative Teaching Networks: A Complex Approach to Accelerating Neural Architecture Search

The paper "Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data" presents an intriguing methodology designed to automatically generate synthetic training data aimed at expediting the learning processes inherent to neural architectures. This research, while showcasing Generative Teaching Networks (GTNs), primarily focuses on supervised learning tasks and evaluates their potential application in Neural Architecture Search (NAS).

Core Contributions

GTNs represent a meta-learning framework comprising dual nested loops: the inner loop optimizes a learner's parameters via standard techniques like SGD, while the outer loop performs meta-optimization via meta-gradients to adjust the generator parameters to produce effective synthetic datasets. This paradigm, which encapsulates the idea that the generative model need not solely emulate true data distributions, allows for the construction of synthetic datasets potentially capable of enhancing the learning velocity over initial training data vastly.

A notable contribution of the work includes addressing the computational overhead typically associated with NAS, which is often dominated by evaluating numerous candidate architectures. GTNs, by training learners efficiently on synthetic data, offer a scalable alternative, promising substantial speed-ups without compromising architectural search quality.

Experimental Results

The paper's experiments demonstrate that GTNs can substantially improve the few-step learning accuracy on benchmarks such as MNIST and CIFAR10. Specifically, on MNIST, networks trained with GTN-generated data showcased higher few-step accuracy compared to training with real data. Moreover, the paper revealed compelling findings on NAS applications: GTN-NAS found more competitive architectures faster and with less computational resource usage than traditional techniques.

Implications and Future Directions

GTN's ability to create learner-agnostic synthetic data sets a precedent for generating diverse training environments. This flexibility allows for innovation in domains beyond supervised learning, like reinforcement learning, where synthetic experiences may scaffold agent training. The implications for NAS are particularly profound since the faster evaluation of potential architectures can transform NAS efficiency, enabling the discovery of state-of-the-art models in reduced timeframes.

The challenges addressed concerning the regularization of GTNs and the stability of meta-gradient training—with weight normalization offering significant improvements—suggest avenues for methodological refinement, which could enhance GTN's robustness and applicability across varied domains.

Additionally, the question of synthetic data realism versus effectiveness brings an engaging aspect to future research, hinting that artificially constructed data, while not visually realistic, can perpetually enhance model training efficacy; an observation challenging the conventional focus on realism.

Conclusion

This study opens a promising pathway for algorithms capable of autonomously generating surrogate datasets. The strategic use of GTNs in NAS presents a compelling avenue, linking the fields of generative models and architecture search. Future explorations will likely address expanding GTN's applicability to broader learning paradigms, optimizing their efficiency, and dissecting the intricate balance between data realism and learning outcomes. This paper creates fertile ground for deeper inquiries into synthetic data generation's role in complex algorithmic training and optimization.

Markdown Report Issue