Domain Adaptation: Learning Bounds and Algorithms
Abstract: This paper addresses the general problem of domain adaptation which arises in a variety of applications where the distribution of the labeled sample available somewhat differs from that of the test data. Building on previous work by Ben-David et al. (2007), we introduce a novel distance between distributions, discrepancy distance, that is tailored to adaptation problems with arbitrary loss functions. We give Rademacher complexity bounds for estimating the discrepancy distance from finite samples for different loss functions. Using this distance, we derive novel generalization bounds for domain adaptation for a wide family of loss functions. We also present a series of novel adaptation bounds for large classes of regularization-based algorithms, including support vector machines and kernel ridge regression based on the empirical discrepancy. This motivates our analysis of the problem of minimizing the empirical discrepancy for various loss functions for which we also give novel algorithms. We report the results of preliminary experiments that demonstrate the benefits of our discrepancy minimization algorithms for domain adaptation.
- Alizadeh][1995]alizadeh Alizadeh, F. (1995). Interior point methods in semidefinite programming with applications to combinatorial optimization. SIAM Journal on Optimization, 5, 13–51.
- Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, 2002.
- Analysis of representations for domain adaptation. Proceedings of NIPS 2006.
- Learning bounds for domain adaptation. Proceedings of NIPS 2007.
- Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. ACL 2007.
- Stability and generalization. JMLR, 2, 499–526.
- Chazelle][2000]chazelle Chazelle, B. (2000). The discrepancy method: randomness and complexity. New York: Cambridge University Press.
- Adaptation of maximum entropy capitalizer: Little data can help a lot. Computer Speech & Language, 20, 382–399.
- Sample selection bias correction theory. Proceedings of ALT 2008. Springer, Heidelberg, Germany.
- Support-Vector Networks. Machine Learning, 20.
- Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26, 101–126.
- A probabilistic theory of pattern recognition. Springer.
- Frustratingly Hard Domain Adaptation for Parsing. CoNLL 2007.
- Elkan][2001]elkan Elkan, C. (2001). The foundations of cost-sensitive learning. IJCAI (pp. 973–978).
- Fletcher][1985]fletcher Fletcher, R. (1985). On minimizing the maximum eigenvalue of a symmetric matrix. SIAM J. Control and Optimization, 23, 493–513.
- Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Transactions on Speech and Audio Processing, 2, 291––298.
- Bundle methods to minimize the maximum eigenvalue function. In Handbook of semidefinite programming: Theory, algorithms, and applications. Kluwer Academic Publishers, Boston, MA.
- Jarre][1993]jarre Jarre, F. (1993). An interior-point method for minimizing the maximum eigenvalue of a linear combination of matrices. SIAM J. Control Optim., 31, 1360–1377.
- Jelinek][1998]jelinek Jelinek, F. (1998). Statistical Methods for Speech Recognition. The MIT Press.
- Instance Weighting for Domain Adaptation in NLP. Proceedings of ACL 2007 (pp. 264–271). Association for Computational Linguistics.
- A min-max-sum resource allocation problem and its application. Operations Research, 49, 913–922.
- Detecting change in data streams. Proceedings of the 30th International Conference on Very Large Data Bases.
- Rademacher processes and bounding the risk of function learning. In High dimensional probability ii, 443–459. preprint.
- Probability in Banach spaces: isoperimetry and processes. Springer.
- Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language, 171–185.
- Domain adaptation with multiple sources. Advances in Neural Information Processing Systems (2008).
- MartÃnez][2002]martinez MartÃnez, A. M. (2002). Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class. IEEE Trans. Pattern Anal. Mach. Intell., 24, 748–763.
- Interior point polynomial methods in convex programming: Theory and applications. SIAM.
- Overton][1988]overton Overton, M. L. (1988). On minimizing the maximum eigenvalue of a symmetric matrix. SIAM J. Matrix Anal. Appl., 9, 256–268.
- Adaptive language modeling using minimum discriminant estimation. HLT ’91: Proceedings of the workshop on Speech and Natural Language (pp. 103–106).
- Supervised and unsupervised PCFG adaptation to novel domains. Proceedings of HLT-NAACL.
- Rosenfeld][1996]Rosenfeld96 Rosenfeld, R. (1996). A Maximum Entropy Approach to Adaptive Statistical Language Modeling. Computer Speech and Language, 10, 187–228.
- Ridge Regression Learning Algorithm in Dual Variables. ICML (pp. 515–521).
- Valiant][1984]valiant Valiant, L. G. (1984). A theory of the learnable. ACM Press New York, NY, USA.
- Vapnik][1998]vapnik98 Vapnik, V. N. (1998). Statistical learning theory. John Wiley & Sons.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.