On the Validation of Gibbs Algorithms: Training Datasets, Test Datasets and their Aggregation
Abstract: The dependence on training data of the Gibbs algorithm (GA) is analytically characterized. By adopting the expected empirical risk as the performance metric, the sensitivity of the GA is obtained in closed form. In this case, sensitivity is the performance difference with respect to an arbitrary alternative algorithm. This description enables the development of explicit expressions involving the training errors and test errors of GAs trained with different datasets. Using these tools, dataset aggregation is studied and different figures of merit to evaluate the generalization capabilities of GAs are introduced. For particular sizes of such datasets and parameters of the GAs, a connection between Jeffrey's divergence, training and test errors is established.
- S.Ā M. Perlaza, G.Ā Bisson, I.Ā Esnaola, A.Ā Jean-Marie, and S.Ā Rini, āEmpirical risk minimization with generalized relative entropy regularization,ā INRIA, Centre Inria dāUniversitĆ© CĆ“te dāAzur, Sophia Antipolis, France, Tech. Rep. RR-9454, Feb. 2022.
- L.Ā ZdeborovĆ” and F.Ā Krzakala, āStatistical physics of inference: Thresholds and algorithms,ā Advances in Physics, vol.Ā 65, no.Ā 5, pp. 453ā552, Aug. 2016.
- P.Ā Alquier, J.Ā Ridgway, and N.Ā Chopin, āOn the properties of variational approximations of Gibbs posteriors,ā Journal of Machine Learning Research, vol.Ā 17, no.Ā 1, pp. 8374ā8414, Dec. 2016.
- G.Ā Aminian, Y.Ā Bu, L.Ā Toni, M.Ā Rodrigues, and G.Ā Wornell, āAn exact characterization of the generalization error for the Gibbs algorithm,ā Advances in Neural information Processing Systems, vol.Ā 34, pp. 8106ā8118, Dec. 2021.
- T.Ā Zhang, āFrom ϵitalic-ϵ\epsilonitalic_ϵ-entropy to KL-entropy: Analysis of minimum information complexity density estimation,ā The Annals of Statistics, vol.Ā 34, no.Ā 5, pp. 2180ā2210, 2006.
- āā, āInformation-theoretic upper and lower bounds for statistical estimation,ā IEEE Transactions on Information Theory, vol.Ā 52, no.Ā 4, pp. 1307ā1321, Apr. 2006.
- J.Ā Jiao, Y.Ā Han, and T.Ā Weissman, āDependence measures bounding the exploration bias for general measurements,ā in Proceedings of the IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, Jun. 2017, pp. 1475ā1479.
- A.Ā Xu and M.Ā Raginsky, āInformation-theoretic analysis of generalization capability of learning algorithms,ā Advances in Neural information Processing Systems, Dec. 2017.
- H.Ā Wang, M.Ā Diaz, J.Ā C.Ā S. SantosĀ Filho, and F.Ā P. Calmon, āAn information-theoretic view of generalization via Wasserstein distance,ā in Proceedings of the IEEE International Symposium on Information Theory (ISIT), Paris, France, Jul. 2019, pp. 577ā581.
- I.Ā Issa, A.Ā R. Esposito, and M.Ā Gastpar, āStrengthened information-theoretic bounds on the generalization error,ā in Proceedings of the IEEE International Symposium on Information Theory (ISIT), Paris, France, Jul. 2019, pp. 582ā586.
- D.Ā Russo and J.Ā Zou, āHow much does your data exploration overfit? Controlling bias via information usage,ā IEEE Transactions on Information Theory, vol.Ā 66, no.Ā 1, pp. 302ā323, Jan. 2019.
- Y.Ā Bu, S.Ā Zou, and V.Ā V. Veeravalli, āTightening mutual information-based bounds on generalization error,ā IEEE Journal on Selected Areas in Information Theory, vol.Ā 1, no.Ā 1, pp. 121ā130, 2020.
- A.Ā Asadi, E.Ā Abbe, and S.Ā VerdĆŗ, āChaining mutual information and tightening generalization bounds,ā Advances in Neural information Processing Systems, pp. 7245ā7254, Dec. 2018.
- A.Ā T. Lopez and V.Ā Jog, āGeneralization error bounds using Wasserstein distances,ā in Proceedings of the IEEE Information Theory Workshop (ITW), Guangzhou, China, Nov. 2018, pp. 1ā5.
- A.Ā R. Asadi and E.Ā Abbe, āChaining meets chain rule: Multilevel entropic regularization and training of neural networks.ā Journal of Machine Learning Research, vol.Ā 21, pp. 139ā1, 2020.
- H.Ā Hafez-Kolahi, Z.Ā Golgooni, S.Ā Kasaei, and M.Ā Soleymani, āConditioning and processing: Techniques to improve information-theoretic generalization bounds,ā Advances in Neural information Processing Systems, pp. 16ā457ā16ā467, Dec. 2020.
- M.Ā Haghifam, J.Ā Negrea, A.Ā Khisti, D.Ā M. Roy, and G.Ā K. Dziugaite, āSharpened generalization bounds based on conditional mutual information and an application to noisy, iterative algorithms,ā Advances in Neural information Processing Systems, pp. 9925ā9935, Dec. 2018.
- B.Ā RodrĆguezĀ GĆ”lvez, G.Ā Bassi, R.Ā Thobaben, and M.Ā Skoglund, āTighter expected generalization error bounds via Wasserstein distance,ā Advances in Neural information Processing Systems, pp. 19ā109ā19ā121, Dec. 2021.
- A.Ā R. Esposito, M.Ā Gastpar, and I.Ā Issa, āGeneralization error bounds via RĆ©nyi-, f-divergences and maximal leakage,ā IEEE Transactions on Information Theory, vol.Ā 67, no.Ā 8, pp. 4986ā5004, 2021.
- G.Ā Aminian, L.Ā Toni, and M.Ā R. Rodrigues, āJensen-Shannon information based characterization of the generalization error of learning algorithms,ā in Proceedings of the IEEE Information Theory Workshop (ITW), Kanazawa, Japan, Oct. 2021, pp. 1ā5.
- G.Ā Aminian, Y.Ā Bu, L.Ā Toni, M.Ā R.Ā D. Rodrigues, and G.Ā W. Wornell, āInformation-theoretic characterizations of generalization error for the Gibbs algorithm,ā ArXiv Preprint 2210.09864, 2022.
- G.Ā Aminian, Y.Ā Bu, G.Ā W. Wornell, and M.Ā R. Rodrigues, āTighter expected generalization error bounds via convexity of information measures,ā in Proceedings of the IEEE International Symposium on Information Theory (ISIT), Aalto, Finland, Jun. 2022, pp. 2481ā2486.
- J.Ā Shawe-Taylor and R.Ā C. Williamson, āA PAC analysis of a Bayesian estimator,ā in Proceedings of Tenth Annual Conference on Computational Learning Theory, July 1997, pp. 2ā9.
- D.Ā A. McAllester, āPAC-Bayesian stochastic model selection,ā Machine Learning, vol.Ā 51, no.Ā 1, pp. 5ā21, 2003.
- M.Ā Haddouche, B.Ā Guedj, O.Ā Rivasplata, and J.Ā Shawe-Taylor, āPAC-Bayes unleashed: Generalisation bounds with unbounded losses,ā Entropy, vol.Ā 23, no.Ā 10, Oct. 2021.
- B.Ā Guedj and L.Ā Pujol, āStill no free lunches: The price to pay for tighter PAC-Bayes bounds,ā Entropy, vol.Ā 23, no.Ā 11, Nov. 2021.
- S.Ā M. Perlaza, G.Ā Bisson, I.Ā Esnaola, A.Ā Jean-Marie, and S.Ā Rini, āEmpirical risk minimization with relative entropy regularization: Optimality and sensitivity,ā in Proceedings of the IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, Jul. 2022.
- S.Ā M. Perlaza, I.Ā Esnaola, G.Ā Bisson, and H.Ā V. Poor, āSensitivity of the Gibbs algorithm to data aggregation in supervised machine learning,ā INRIA, Centre Inria dāUniversitĆ© CĆ“te dāAzur, Sophia Antipolis, France, Research Report RR-9474, Jun. 2022.
- B.Ā McMahan, E.Ā Moore, D.Ā Ramage, S.Ā Hampson, and B.Ā AgüeraĀ y Arcas, āCommunication-efficient learning of deep networks from decentralized data,ā in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, Florida, Apr. 2017, pp. 1273ā1282.
- F.Ā Daunas, I.Ā Esnaola, S.Ā M. Perlaza, and H.Ā V. Poor, āAnalysis of the relative entropy asymmetry in the regularization of empirical risk minimization,ā in Proceedings of the IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, Jun. 2023.
- H.Ā Jeffreys, āAn invariant form for the prior probability in estimation problems,ā Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, vol. 186, no. 1007, pp. 453ā461, 1946.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.