SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data
Abstract: With the growing demand for synthetic data to address contemporary issues in machine learning, such as data scarcity, data fairness, and data privacy, having robust tools for assessing the utility and potential privacy risks of such data becomes crucial. SynthEval, a novel open-source evaluation framework distinguishes itself from existing tools by treating categorical and numerical attributes with equal care, without assuming any special kind of preprocessing steps. This~makes it applicable to virtually any synthetic dataset of tabular records. Our tool leverages statistical and machine learning techniques to comprehensively evaluate synthetic data fidelity and privacy-preserving integrity. SynthEval integrates a wide selection of metrics that can be used independently or in highly customisable benchmark configurations, and can easily be extended with additional metrics. In this paper, we describe SynthEval and illustrate its versatility with examples. The framework facilitates better benchmarking and more consistent comparisons of model capabilities.
- Bhanot, K., Qi, M., Erickson, J.S., Guyon, I., Bennett, K.P.: The problem of fairness in synthetic healthcare data. Entropy 23(9), 1165 (2021) https://doi.org/10.3390/e23091165 Hernandez et al. [2022] Hernandez, M., Epelde, G., Alberdi, A., Cilla, R., Rankin, D.: Synthetic data generation for tabular health records: A systematic review. Neurocomputing 493, 28–45 (2022) https://doi.org/10.1016/J.NEUCOM.2022.04.053 Ping et al. [2017] Ping, H., Stoyanovich, J., Howe, B.: Datasynthesizer: Privacy-preserving synthetic datasets. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, Chicago, IL, USA, June 27-29 (2017). https://doi.org/10.1145/3085504.3091117 Emam et al. [2020] Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hernandez, M., Epelde, G., Alberdi, A., Cilla, R., Rankin, D.: Synthetic data generation for tabular health records: A systematic review. Neurocomputing 493, 28–45 (2022) https://doi.org/10.1016/J.NEUCOM.2022.04.053 Ping et al. [2017] Ping, H., Stoyanovich, J., Howe, B.: Datasynthesizer: Privacy-preserving synthetic datasets. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, Chicago, IL, USA, June 27-29 (2017). https://doi.org/10.1145/3085504.3091117 Emam et al. [2020] Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ping, H., Stoyanovich, J., Howe, B.: Datasynthesizer: Privacy-preserving synthetic datasets. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, Chicago, IL, USA, June 27-29 (2017). https://doi.org/10.1145/3085504.3091117 Emam et al. [2020] Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Hernandez, M., Epelde, G., Alberdi, A., Cilla, R., Rankin, D.: Synthetic data generation for tabular health records: A systematic review. Neurocomputing 493, 28–45 (2022) https://doi.org/10.1016/J.NEUCOM.2022.04.053 Ping et al. [2017] Ping, H., Stoyanovich, J., Howe, B.: Datasynthesizer: Privacy-preserving synthetic datasets. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, Chicago, IL, USA, June 27-29 (2017). https://doi.org/10.1145/3085504.3091117 Emam et al. [2020] Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ping, H., Stoyanovich, J., Howe, B.: Datasynthesizer: Privacy-preserving synthetic datasets. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, Chicago, IL, USA, June 27-29 (2017). https://doi.org/10.1145/3085504.3091117 Emam et al. [2020] Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Ping, H., Stoyanovich, J., Howe, B.: Datasynthesizer: Privacy-preserving synthetic datasets. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, Chicago, IL, USA, June 27-29 (2017). https://doi.org/10.1145/3085504.3091117 Emam et al. [2020] Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Emam, K.E., Mosquera, L., Bass, J.: Evaluating identity disclosure risk in fully synthetic health data: Model development and validation. Journal of Medical Internet Research 22(11), 23139 (2020) https://doi.org/10.2196/23139 Abouelmehdi et al. [2018] Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Abouelmehdi, K., Beni-Hessane, A., Khaloufi, H.: Big healthcare data: preserving security and privacy. Journal of Big Data 5(1) (2018) https://doi.org/10.1186/s40537-017-0110-7 Nowok et al. [2016] Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Nowok, B., Raab, G.M., Dibben, C.: synthpop: Bespoke creation of synthetic data in r. Journal of Statistical Software 74(11) (2016) https://doi.org/10.18637/jss.v074.i11 Rankin et al. [2020] Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Rankin, D., Black, M., Bond, R., Wallace, J., Mulvenna, M., Epelde, G.: Reliability of supervised machine learning using synthetic data in health care: Model to preserve privacy for data sharing. JMIR Medical Informatics 8(7), 18910 (2020) https://doi.org/10.2196/18910 van Breugel et al. [2021] Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Breugel, B., Kyono, T., Berrevoets, J., Schaar, M.: DECAF: generating fair synthetic data using causally-aware generative networks. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 22221–22233 (2021) Dankar et al. [2022] Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Dankar, F.K., Ibrahim, M.K., Ismail, L.: A multi-dimensional evaluation of synthetic data generators. IEEE Access 10, 11147–11158 (2022) https://doi.org/10.1109/access.2022.3144765 Figueira and Vaz [2022] Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Figueira, A., Vaz, B.: Survey on synthetic data generation, evaluation methods and GANs. Mathematics 10(15), 2733 (2022) https://doi.org/10.3390/math10152733 Dwork and Roth [2013] Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3-4), 211–487 (2013) https://doi.org/10.1561/0400000042 Yale et al. [2020] Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J.S., Bennett, K.P.: Synthesizing quality open data assets from private health research studies. In: Abramowicz, W., Klein, G. (eds.) Business Information Systems Workshops - BIS 2020 International Workshops, Colorado Springs, CO, USA, June 8-10, 2020, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 394, pp. 324–335 (2020). https://doi.org/10.1007/978-3-030-61146-0_26 Lenz et al. [2021] Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Lenz, S., Hess, M., Binder, H.: Deep generative models in DataSHIELD. BMC Medical Research Methodology 21(1) (2021) https://doi.org/10.1186/s12874-021-01237-6 Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Emam, K.E., Mosquera, L., Fang, X., El-Hussuna, A.: Utility metrics for evaluating synthetic health data generation methods: Validation study. JMIR Medical Informatics 10(4), 35734 (2022) https://doi.org/10.2196/35734 Qian et al. [2023] Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Qian, Z., Cebere, B., Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arXiv, preprint (2023). https://doi.org/10.48550/arXiv.2301.07573 DataCebo, Inc. [2023] DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- DataCebo, Inc.: Synthetic Data Metrics. (2023). DataCebo, Inc.. Version 0.9.3. https://docs.sdv.dev/sdmetrics/ Brenninkmeijer [2021] Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Brenninkmeijer, B.: Table Evaluator. GitHub (2021) Yale et al. [2019] Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P.: Privacy preserving synthetic health data. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019 (2019) Yan et al. [2022] Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Yan, C., Yan, Y., Wan, Z., Zhang, Z., Omberg, L., Guinney, J., Mooney, S.D., Malin, B.A.: A multifaceted benchmarking of synthetic electronic health record generation models. Nature Communications 13(1) (2022) https://doi.org/10.1038/s41467-022-35295-1 Murtaza et al. [2023] Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Murtaza, H., Ahmed, M., Khan, N.F., Murtaza, G., Zafar, S., Bano, A.: Synthetic data generation: State of the art in health care domain. Computer Science Review 48, 100546 (2023) https://doi.org/10.1016/j.cosrev.2023.100546 [22] Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Lautrup, A.D., Hyrup, T., Zimek, A., Schneider-Kamp, P.: Systematised review of generative modelling tools and utility metrics for fully synthetic tabular data. [under review] Gower [1971] Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Gower, J.C.: A general coefficient of similarity and some of its properties. Biometrics 27(4), 857–871 (1971) https://doi.org/10.2307/2528823 Kamal et al. [2019] Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Kamal, S., ElEleimy, M., Hegazy, D., Nasr, M.: Hepatitis C Virus (HCV) for Egyptian patients. UCI Machine Learning Repository, dataset (2019). https://doi.org/10.24432/c5989v Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) https://doi.org/10.1613/jair.953 Yoon et al. [2020] Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Yoon, J., Drumright, L.N., Schaar, M.: Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Informatics 24(8), 2378–2388 (2020) https://doi.org/10.1109/JBHI.2020.2980262 Xu et al. [2019] Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 7333–7343 (2019) Goodfellow et al. [2014] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative Adversarial Networks. arXiv, preprint (2014). https://doi.org/10.48550/arXiv.1406.2661 Sun and Erath [2015] Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Sun, L., Erath, A.: A bayesian network approach for population synthesis. Transportation Research Part C: Emerging Technologies 61, 49–62 (2015) https://doi.org/10.1016/j.trc.2015.10.010 Reiter [2005] Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Reiter, J.P.: Using cart to generate partially synthetic public use microdata. Journal of Official Statistics 21(3), 441–462 (2005) Drechsler and Reiter [2011] Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Drechsler, J., Reiter, J.P.: An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Stat. Data Anal. 55(12), 3232–3243 (2011) European Medicines Agency [2018] European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- European Medicines Agency: External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use. https://www.ema.europa.eu/en/human-regulatory/marketing-authorisation/clinical-data-publication/support-industry/external-guidance-implementation-european-medicines-agency-policy-publication-clinical-data (2018) Health Canada [2019] Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Health Canada: Public Release of Clinical Information: guidance document. https://www.canada.ca/en/health-canada/services/drug-health-product-review-approval/profile-public-release-clinical-information-guidance/document.html (2019) Karr et al. [2006] Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Karr, A.F., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P.: A framework for evaluating the utility of data altered to protect confidentiality. The American Statistician 60(3), 224–232 (2006) https://doi.org/10.1198/000313006x124640 Zhu et al. [2022] Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Zhu, Y., Zhao, Z., Birke, R., Chen, L.Y.: Permutation-invariant tabular data synthesis. In: Tsumoto, S., Ohsawa, Y., Chen, L., Poel, D.V., Hu, X., Motomura, Y., Takagi, T., Wu, L., Xie, Y., Abe, A., Raghavan, V. (eds.) IEEE International Conference on Big Data, Big Data 2022, 2022, pp. 5855–5864. IEEE, Osaka, Japan, December 17-20 (2022). https://doi.org/10.1109/BigData55660.2022.10020639 Villani [2009] Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Villani, C.: Optimal Transport. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9 Hesterberg et al. [2009] Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Hesterberg, T., Moore, D.S., Monaghan, S., Clipson, A., Epstein, R.: Bootstrap methods and permutation tests. In: Moore, D.S., McCabe, G.P., Craig, B.A. (eds.) Introduction to the Practice of Statistics, 6th edn. W. H. Freeman and Company, New York, NY (2009). Chap. 16 Lenatti et al. [2023] Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Lenatti, M., Paglialonga, A., Orani, V., Ferretti, M., Mongelli, M.: Characterization of synthetic health data using rule-based artificial intelligence models. IEEE Journal of Biomedical and Health Informatics PP, 1–9 (2023) https://doi.org/10.1109/jbhi.2023.3236722 Scott [1979] Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979) https://doi.org/10.1093/biomet/66.3.605 Woo et al. [2009] Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Woo, M., Reiter, J.P., Oganian, A., Karr, A.F.: Global measures of data utility for microdata masked for disclosure limitation. J. Priv. Confidentiality 1(1) (2009) https://doi.org/10.29012/jpc.v1i1.568 Raab et al. [2017] Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Raab, G.M., Nowok, B., Dibben, C.: Guidelines for Producing Useful Synthetic Data. arXiv, preprint (2017). https://doi.org/10.48550/arXiv.1712.04078 Snoke et al. [2018] Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavkovic, A.: General and specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(3), 663–688 (2018) https://doi.org/10.1111/rssa.12358 Hornby and Hu [2021] Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Hornby, R., Hu, J.: Identification risks evaluation of partially synthetic data with the identificationriskcalculation R package. Trans. Data Priv. 14(1), 37–52 (2021) https://doi.org/10.48550/arXiv.2006.01298 Ooko et al. [2021] Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Ooko, S.O., Mukanyiligira, D., Munyampundu, J.P., Nsenga, J.: Synthetic exhaled breath data-based edge AI model for the prediction of chronic obstructive pulmonary disease. In: 2021 International Conference on Computing and Communications Applications and Technologies (I3CAT). IEEE, Ipswich, United Kingdom, September 15 (2021). https://doi.org/10.1109/i3cat53310.2021.9629420 Fan et al. [2020] Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Fan, J., Liu, T., Li, G., Chen, J., Shen, Y., Du, X.: Relational data synthesis using generative adversarial networks: A design space exploration. Proc. VLDB Endow. 13(11), 1962–1975 (2020) https://doi.org/10.14778/3407790.3407802 Zhao et al. [2021] Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Zhao, Z., Kunar, A., Scheer, H.V., Birke, R., Chen, L.Y.: CTAB-GAN: Effective Table Data Synthesizing. arXiv, preprint (2021). https://doi.org/10.48550/arXiv.2102.08369 Yan et al. [2020] Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Yan, C., Zhang, Z., Nyemba, S., Malin, B.A.: Generating electronic health records with multiple data types and constraints. In: AMIA 2020, American Medical Informatics Association Annual Symposium. AMIA, Virtual Event, USA, November 14-18 (2020) Emam et al. [2022] Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083 Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
- Emam, K.E., Mosquera, L., Fang, X.: Validating a membership disclosure metric for synthetic health data. JAMIA Open 5(4), 083 (2022) https://doi.org/10.1093/jamiaopen/ooac083
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.