Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Discrepancy-Based Perspective on Dataset Condensation

Published 12 Sep 2025 in cs.LG | (2509.10367v1)

Abstract: Given a dataset of finitely many elements $\mathcal{T} = {\mathbf{x}i}{i = 1}N$, the goal of dataset condensation (DC) is to construct a synthetic dataset $\mathcal{S} = {\tilde{\mathbf{x}}j}{j = 1}M$ which is significantly smaller ($M \ll N$) such that a model trained from scratch on $\mathcal{S}$ achieves comparable or even superior generalization performance to a model trained on $\mathcal{T}$. Recent advances in DC reveal a close connection to the problem of approximating the data distribution represented by $\mathcal{T}$ with a reduced set of points. In this work, we present a unified framework that encompasses existing DC methods and extend the task-specific notion of DC to a more general and formal definition using notions of discrepancy, which quantify the distance between probability distribution in different regimes. Our framework broadens the objective of DC beyond generalization, accommodating additional objectives such as robustness, privacy, and other desirable properties.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.