Generation and Simulation of Synthetic Datasets with Copulas

Published 30 Mar 2022 in cs.LG and cs.AI | (2203.17250v1)

Abstract: This paper proposes a new method to generate synthetic data sets based on copula models. Our goal is to produce surrogate data resembling real data in terms of marginal and joint distributions. We present a complete and reliable algorithm for generating a synthetic data set comprising numeric or categorical variables. Applying our methodology to two datasets shows better performance compared to other methods such as SMOTE and autoencoders.