Strong Baselines for Neural Semi-supervised Learning under Domain Shift

Published 25 Apr 2018 in cs.CL, cs.LG, and stat.ML | (1804.09530v1)

Abstract: Novel neural models have been proposed in recent years for learning under domain shift. Most models, however, only evaluate on a single task, on proprietary datasets, or compare to weak baselines, which makes comparison of models difficult. In this paper, we re-evaluate classic general-purpose bootstrapping approaches in the context of neural networks under domain shifts vs. recent neural approaches and propose a novel multi-task tri-training method that reduces the time and space complexity of classic tri-training. Extensive experiments on two benchmarks are negative: while our novel method establishes a new state-of-the-art for sentiment analysis, it does not fare consistently the best. More importantly, we arrive at the somewhat surprising conclusion that classic tri-training, with some additions, outperforms the state of the art. We conclude that classic approaches constitute an important and strong baseline.

Abstract PDF Upgrade to Chat

Citations (165)

View on Semantic Scholar

Summary

Analysis of Asymmetric Tri-training and Semi-supervised Learning Techniques

The paper presents a detailed investigation into semi-supervised learning strategies, specifically focusing on Asymmetric Tri-training and its associated methodologies. Building on foundational work, this research attempts to refine tri-training techniques and explore their efficacy across varied experimental setups. The principal algorithms explored include Asymmetric Tri-training, self-training, co-training, and tri-training, alongside advancements such as Multi-task Tri-training (AMT3) and temporal ensembling.

The Asymmetric Tri-training algorithm proposed operates by initially training a base classifier on labeled data, followed by iteratively refining this model through pseudo-labeling the unlabeled dataset, conditioned on consensus predictions. The strategic proposal involves training different models on unlabelled data, driven by orthogonality constraints and adversarial losses, to enhance learning diversity and robustness. This work critically examines how disagreements among model predictions can leverage unlabeled data more effectively than conventional SSL practices.

Notably, the paper highlights key numerical results during the investigation, such as the slight advantage tri-training exhibits on a 10,000 subset, despite this advantage dissipating when extended to a full data setup. The experiments with GloVe initialization showed significant improvements, establishing tri-training as superior in several contexts compared to self-training. However, it is noteworthy that self-training yielded measurable gains over configurations without GloVe embeddings, albeit not matching tri-training's effectiveness.

Another significant element in the research is the exploration of temporal ensembling techniques where ensemble momentum, ramp-up lengths, and unsupervised weights are tuned for optimal performance. Empirical results indicate varying momentum rates and unsupervised weights substantially affect outcome quality, with suggested ranges for these hyperparameters documented.

Through various experimental phases, the research integrates data selection processes, adopting metrics such as Jensen-Shannon divergence and domain similarity metrics to enhance model training on unlabeled data. This aligns closely with objectives in active learning and domain adaptation tasks, with exploratory steps towards online learning indicating potential for real-time adaptation to dynamic datasets.

Overall, while the paper achieves strong empirical findings across multiple frameworks and configurations, it notably identifies several areas where further research could be beneficial. This includes the robustness of neural network confidence measures, techniques for learning the value of unlabeled examples, and augmentations for streaming data scenarios. The paper makes valuable contributions to the field of SSL by questioning existing baselines and experimenting with novel extensions. Future directions emphasize the pursuit of algorithmic refinement and exploration of deep learning environments that simulate domain shifts and concept drifts for more adaptive SSL methodologies.

In summary, this paper provides significant insights into enhancing semi-supervised learning through tri-training methodologies, with pronounced focus on the practical implications of orthogonality constraints, ensemble techniques, and domain-specific adaptation strategies. It reflects an evolving landscape where SSL techniques continually shape the precision and capability of machine learning applications across diverse datasets.