Papers
Topics
Authors
Recent
Search
2000 character limit reached

GLM Inference with AI-Generated Synthetic Data Using Misspecified Linear Regression

Published 27 Mar 2025 in stat.ME | (2503.21968v1)

Abstract: Privacy concerns in data analysis have led to the growing interest in synthetic data, which strives to preserve the statistical properties of the original dataset while ensuring privacy by excluding real records. Recent advances in deep neural networks and generative artificial intelligence have facilitated the generation of synthetic data. However, although prediction with synthetic data has been the focus of recent research, statistical inference with synthetic data remains underdeveloped. In particular, in many settings, including generalized linear models (GLMs), the estimator obtained using synthetic data converges much more slowly than in standard settings. To address these limitations, we propose a method that leverages summary statistics from the original data. Using a misspecified linear regression estimator, we then develop inference that greatly improves the convergence rate and restores the standard root-$n$ behavior for GLMs.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.