Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives
Abstract: The rise in internet usage has led to the generation of massive amounts of data, resulting in the adoption of various supervised and semi-supervised machine learning algorithms, which can effectively utilize the colossal amount of data to train models. However, before deploying these models in the real world, these must be strictly evaluated on performance measures like worst-case recall and satisfy constraints such as fairness. We find that current state-of-the-art empirical techniques offer sub-optimal performance on these practical, non-decomposable performance objectives. On the other hand, the theoretical techniques necessitate training a new model from scratch for each performance objective. To bridge the gap, we propose SelMix, a selective mixup-based inexpensive fine-tuning technique for pre-trained models, to optimize for the desired objective. The core idea of our framework is to determine a sampling distribution to perform a mixup of features between samples from particular classes such that it optimizes the given objective. We comprehensively evaluate our technique against the existing empirical and theoretically principled methods on standard benchmark datasets for imbalanced classification. We find that proposed SelMix fine-tuning significantly improves the performance for various practical non-decomposable objectives across benchmarks.
- The non-stochastic multi-armed bandit problem. SIAM Journal of Computing, 32(1):48–77, 2002.
- Remixmatch: Semi-supervised learning with distribution alignment and augmentation anchoring. CoRR, abs/1911.09785, 2019a.
- Mixmatch: A holistic approach to semi-supervised learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems. Curran Associates, Inc., 2019b.
- Davide Castelvecchi. Is facial recognition too biased to be let loose? Nature, 587(7834):347–350, 2020.
- Transmix: Attend to mix for vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12135–12144, 2022.
- Optimization with non-differentiable constraints with applications to fairness, recall, churn, and other goals. J. Mach. Learn. Res., 20(172):1–59, 2019.
- Scalable learning of non-decomposable objectives. In Artificial intelligence and statistics, pp. 832–840. PMLR, 2017.
- Cossl: Co-learning of representation and classifier for imbalanced semi-supervised learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
- Imagebind: One embedding space to bind them all. arXiv preprint arXiv:2305.05665, 2023.
- Satisfying real-world goals with dataset constraints. Advances in Neural Information Processing Systems, 29, 2016.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009, 2022.
- Selecmix: Debiased learning by contradicting-pair sampling. Advances in Neural Information Processing Systems, 35:14345–14357, 2022.
- Online optimization methods for the quantification problem. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1625–1634, 2016.
- Learning without default: A study of one-class classification and the low-default portfolio problem. In Artificial Intelligence and Cognitive Science: 20th Irish Conference, AICS 2009, Dublin, Ireland, August 19-21, 2009, Revised Selected Papers 20, pp. 174–187. Springer, 2010.
- Distribution aligning refinery of pseudo-label for imbalanced semi-supervised learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020a. Curran Associates Inc. ISBN 9781713829546.
- Puzzle mix: Exploiting saliency and local statistics for optimal mixup. In International Conference on Machine Learning, pp. 5275–5285. PMLR, 2020b.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Big transfer (bit): General visual representation learning. In European conference on computer vision, pp. 491–507. Springer, 2020.
- Learning multiple layers of features from tiny images. 2009. Technical report, University of Toronto.
- Bandit algorithms. Cambridge University Press, 2020.
- Abc: Auxiliary balanced classifier for class-imbalanced semi-supervised learning. Advances in Neural Information Processing Systems, 34:7082–7094, 2021.
- Earning extra performance from restrictive feedbacks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Long-tail learning via logit adjustment. In International Conference on Learning Representations, 2020.
- Agnostic federated learning. In International Conference on Machine Learning, pp. 4615–4625. PMLR, 2019.
- Training over-parameterized models with non-decomposable objectives. Advances in Neural Information Processing Systems, 34, 2021.
- On the statistical consistency of plug-in classifiers for non-decomposable performance measures. Advances in neural information processing systems, 27, 2014.
- Optimizing non-decomposable performance measures: A tale of two classes. In International Conference on Machine Learning, pp. 199–208. PMLR, 2015a.
- Consistent multiclass algorithms for complex performance measures. In International Conference on Machine Learning, pp. 2398–2407. PMLR, 2015b.
- Consistent multiclass algorithms for complex metrics and constraints. arXiv preprint arXiv:2210.09695, 2022.
- Daso: Distribution-aware semantics-oriented pseudo-label for imbalanced semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9786–9796, 2022.
- Improving domain generalization with interpolation robustness. In NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications, 2022.
- Cost-sensitive self-training for optimizing non-decomposable metrics. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
- Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
- Optimizing non-decomposable measures with deep networks. Machine Learning, 107(8):1597–1620, 2018.
- Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems, 33:596–608, 2020.
- Boosting for learning multiple classes with imbalanced class distribution. In Sixth international conference on data mining (ICDM’06), pp. 592–602. IEEE, 2006.
- Selective mixup helps with distribution shifts, but not (only) because of mixup. arXiv preprint arXiv:2305.16817, 2023.
- Saliencymix: A saliency guided data augmentation strategy for better regularization. arXiv preprint arXiv:2006.01791, 2020.
- Manifold mixup: Better representations by interpolating hidden states. In International Conference on Machine Learning, pp. 6438–6447. PMLR, 2019.
- Multiclass imbalance problems: Analysis and potential solutions. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(4):1119–1130, 2012.
- Crest: A class-rebalancing self-training framework for imbalanced semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10857–10866, June 2021a.
- Crest: A class-rebalancing self-training framework for imbalanced semi-supervised learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10857–10866, 2021b.
- Natural evolution strategies. The Journal of Machine Learning Research, 15(1):949–980, 2014.
- Stephen J Wright. Coordinate descent algorithms. Mathematical programming, 151(1):3–34, 2015.
- Improving out-of-distribution robustness via selective augmentation. In International Conference on Machine Learning, pp. 25407–25437. PMLR, 2022.
- Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 6023–6032, 2019.
- Wide residual networks. CoRR, abs/1605.07146, 2016.
- mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 2018.
- How does mixup help with robustness and generalization? In International Conference on Learning Representations, 2021.
- Improving calibration for long-tailed recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16489–16498, 2021.
- Automix: Mixup networks for sample interpolation via cooperative barycenter learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16, pp. 633–649. Springer, 2020.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.