Identification of Negative Transfers in Multitask Learning Using Surrogate Models

Published 25 Mar 2023 in cs.LG, cs.AI, cs.CL, and stat.ML | (2303.14582v2)

Abstract: Multitask learning is widely used in practice to train a low-resource target task by augmenting it with multiple related source tasks. Yet, naively combining all the source tasks with a target task does not always improve the prediction performance for the target task due to negative transfers. Thus, a critical problem in multitask learning is identifying subsets of source tasks that would benefit the target task. This problem is computationally challenging since the number of subsets grows exponentially with the number of source tasks; efficient heuristics for subset selection do not always capture the relationship between task subsets and multitask learning performances. In this paper, we introduce an efficient procedure to address this problem via surrogate modeling. In surrogate modeling, we sample (random) subsets of source tasks and precompute their multitask learning performances. Then, we approximate the precomputed performances with a linear regression model that can also predict the multitask performance of unseen task subsets. We show theoretically and empirically that fitting this model only requires sampling linearly many subsets in the number of source tasks. The fitted model provides a relevance score between each source and target task. We use the relevance scores to perform subset selection for multitask learning by thresholding. Through extensive experiments, we show that our approach predicts negative transfers from multiple source tasks to target tasks much more accurately than existing task affinity measures. Additionally, we demonstrate that for several weak supervision datasets, our approach consistently improves upon existing optimization methods for multitask learning.

Abstract PDF Upgrade to Chat

Citations (10)

View on Semantic Scholar

Summary

The paper introduces surrogate models to efficiently predict multitask performance and identify negative transfers.
It employs linear regression on sampled task subsets, achieving high prediction accuracy with an average F1-score of 0.80.
The method significantly reduces computational overhead and scales linearly, offering practical benefits for weak supervision datasets.

Identification of Negative Transfers in Multitask Learning Using Surrogate Models

This paper introduces a novel approach to tackle the problem of identifying negative transfers in Multitask Learning (MTL). By utilizing surrogate models, the authors propose an efficient procedure to predict and optimize multitask performances, addressing the computational challenges in subset selection of source tasks. The methodology provides significant improvements over existing task affinity measures and optimization methods, particularly for weak supervision datasets. This essay details the paper's methods, theoretical analyses, experimental validations, and the implications for MTL.

Methodology: Surrogate Modeling

The central proposition is the design of surrogate models to approximate MTL performances. This approach involves two key steps:

Subset Sampling and MTL Evaluation: The methodology begins with the sampling of $n$ random subsets of source tasks from a pool of $k$ total tasks. For each subset $S_i$ , an MTL model is trained using the data from $S_i$ combined with the target task. The MTL model's loss, $f(S_i)$ , is then evaluated on a target task dataset.
Figure 1: Our approach involves sampling subsets and training MTL models to evaluate and predict performance.
Linear Regression and Relevance Score Estimation: The second step employs a linear regression model to estimate relevance scores $\theta$ . This model effectively predicts the MTL performance on unseen task subsets. Relevance scores guide subset selection for optimal multitask learning via thresholding.

Theoretical Analysis

The authors present rigorous theoretical analysis, ensuring that surrogate models can be fitted accurately with linear complexity relative to the number of source tasks. The key insights include:

Linear Surrogate Model: The linear model, $g_\theta(S) = \sum_{j \in S}\theta_j$ , simplifies computational requirements and allows scalable task affinity measurement.
Sample Complexity: The analysis indicates that linearly many samples (in terms of $k$ ) suffice to fit the surrogate model with reliable accuracy. This theoretical foundation is supported by results utilizing Rademacher complexity arguments.

Figure 2: Illustration of mixed outcomes in multitask learning.

Experimental Validation

Extensive experiments validate the approach across diverse settings, including weak supervision datasets, NLP tasks, and multi-group learning scenarios.

Prediction Accuracy: The approach effectively identifies positive vs. negative task transfers with high prediction accuracy, demonstrated by an average F_1-score of 0.80 across multiple datasets.
Figure 3: Our approach successfully predicts transfer effects and demonstrates convergence efficiency.
Computational Efficiency: The paper showcases the linear scalability of the approach as $k$ increases, making it feasible for practical applications, even when existing methods become computationally prohibitive.
Performance Enhancement: Compared to baselines, the algorithm significantly boosts the predictive accuracy of MTL models, especially in weak supervision contexts.

Implications and Future Directions

The methodology sets a new standard for efficient and accurate prediction of task-relatedness in MTL, with broad applicability in scenarios where robust, scalable model training is critical. Some implications and potential directions for future work include:

Wider Applicability: Exploration of surrogate modeling techniques in federated learning and reinforcement learning settings.
Refinement and Tuning: Further investigation into adaptive sampling techniques could yield even faster training cycles.
Understanding and Expanding: Extensions to more complex model architectures and tasks would help generalize the findings further.

Conclusion

This paper presents a scalable and efficient solution to identifying negative transfers in multitask learning, demonstrating notable improvements across various benchmarks. Through surrogate models, the proposed method reduces computational overhead while enhancing prediction accuracy. The theoretical and empirical findings solidify the approach as a valuable contribution to MTL research, offering new avenues for optimizing task performance in real-world applications.