An empirical study of domain-agnostic semi-supervised learning via energy-based models: joint-training and pre-training

Published 25 Oct 2020 in cs.LG | (2010.13116v1)

Abstract: A class of recent semi-supervised learning (SSL) methods heavily rely on domain-specific data augmentations. In contrast, generative SSL methods involve unsupervised learning based on generative models by either joint-training or pre-training, and are more appealing from the perspective of being domain-agnostic, since they do not inherently require data augmentations. Joint-training estimates the joint distribution of observations and labels, while pre-training is taken over observations only. Recently, energy-based models (EBMs) have achieved promising results for generative modeling. Joint-training via EBMs for SSL has been explored with encouraging results across different data modalities. In this paper, we make two contributions. First, we explore pre-training via EBMs for SSL and compare it to joint-training. Second, a suite of experiments are conducted over domains of image classification and natural language labeling to give a realistic whole picture of the performances of EBM based SSL methods. It is found that joint-training EBMs outperform pre-training EBMs marginally but nearly consistently.

Abstract PDF Upgrade to Chat

Summary

The paper shows that joint-training EBMs achieve superior SSL performance across domains compared to the pre-training approach.
It applies energy-based models to both image classification and natural language labeling, eliminating the need for domain-specific data augmentations.
Empirical evaluations on CIFAR-10, SVHN, and various NLP tasks reveal efficient label utilization and promising pathways for state-of-the-art SSL methods.

An Empirical Study of Domain-Agnostic Semi-Supervised Learning via Energy-Based Models: Joint-Training and Pre-Training

The paper investigates the effectiveness of energy-based models (EBMs) in domain-agnostic semi-supervised learning (SSL), comparing two methodologies: joint-training and pre-training. This study explores the applications and performance of these approaches across different domains, specifically focusing on image classification and natural language labeling tasks. The findings contribute to advancing SSL techniques that do not rely heavily on domain-specific data augmentations, promising a broader applicability across various domains.

Background and Methodologies

Semi-Supervised Learning Paradigms

SSL aims to leverage both labeled and readily available unlabeled data to improve model training without extensive dependency on labeled data. The paper distinguishes between generative and discriminative SSL approaches. Discriminative SSL relies on domain-specific data augmentations and often achieves impressive outcomes in image classification. However, its success is limited in domains where such augmentations are less effective, such as text and medical imaging.

In contrast, generative SSL incorporates unsupervised learning on unlabeled data through generative models, typically presented as either joint-training or pre-training methodologies. Joint-training involves estimating the joint distribution of observations and labels, while pre-training focuses solely on observations before subsequent fine-tuning with labels. This dual-track methodology promises greater domain-agnosticism as it avoids heavy reliance on data augmentations.

Energy-Based Models in SSL

The paper leverages EBMs, known for their robust generative modeling capabilities, to advance domain-agnostic SSL. EBMs represent probability distributions through energy functions, offering a unified framework adaptable to various data modalities.

In joint-training, EBMs model the joint density of observations and corresponding labels, optimizing both supervised and unsupervised objectives. Pre-training, conversely, entails unsupervised learning overseen solely by observations, followed by fine-tuning during supervised training using labeled data. The study outlines distinct implementation strategies for EBMs across image classification and natural language labeling tasks, illustrating the potential of these generative models in SSL applications.

Experimental Evaluation

Image Classification

The study evaluates SSL performance on standard image classification datasets, notably CIFAR-10 and SVHN, through varying amounts of labeled data. The joint-training EBMs consistently outperform pre-training EBMs and other generative SSL methods, while maintaining marginal competitiveness with leading discriminative SSL approaches that rely heavily on domain-specific augmentations.

An experiment on CIFAR-10 with 4,000 labeled images illustrates that joint-training EBMs achieve considerably superior error rates compared to pre-training EBMs and many generative SSL methods, proving their strength in classification tasks without the need for domain-specific enhancements.

Natural Language Labeling

Extensive experiments assess SSL methodologies within natural language labeling contexts, encompassing tasks like POS tagging, chunking, and named entity recognition (NER). Across varying proportions of labeled data, joint-training EBMs repeatedly demonstrate improved accuracy over both pre-training EBMs and supervised baselines, indicating effective utilization of unlabeled data and superior representation learning.

The comprehensive experimental suite underscores joint-training EBMs' robust performance across diverse settings and data configurations, marking them as potent models for domain-agnostic SSL applications.

Conclusion

The paper advances the field of domain-agnostic SSL by rigorously exploring EBMs through joint-training and pre-training approaches. The findings indicate joint-training EBMs consistently outperform pre-training EBMs with marginal superiority, suggesting promising directions for future applications of generative models unconstrained by domain-specific requirements. The potential for EBMs to deliver state-of-the-art SSL performance across varied domains underscores their value in creating versatile SSL frameworks applicable to a wide range of data modalities.