Distilling Dataset into Neural Field

Published 5 Mar 2025 in cs.CV, cs.AI, and cs.LG | (2503.04835v1)

Abstract: Utilizing a large-scale dataset is essential for training high-performance deep learning models, but it also comes with substantial computation and storage costs. To overcome these challenges, dataset distillation has emerged as a promising solution by compressing the large-scale dataset into a smaller synthetic dataset that retains the essential information needed for training. This paper proposes a novel parameterization framework for dataset distillation, coined Distilling Dataset into Neural Field (DDiF), which leverages the neural field to store the necessary information of the large-scale dataset. Due to the unique nature of the neural field, which takes coordinates as input and output quantity, DDiF effectively preserves the information and easily generates various shapes of data. We theoretically confirm that DDiF exhibits greater expressiveness than some previous literature when the utilized budget for a single synthetic instance is the same. Through extensive experiments, we demonstrate that DDiF achieves superior performance on several benchmark datasets, extending beyond the image domain to include video, audio, and 3D voxel. We release the code at https://github.com/aailab-kaist/DDiF.

Abstract PDF Upgrade to Chat

Summary

The paper presents DDiF, a dataset distillation method that uses neural fields with sine activations to achieve a larger feasible function space than previous approaches.
The methodology formulates dataset distillation as an optimization problem over synthetic neural fields, enabling cross-resolution representation and efficient compression.
Extensive experiments across images, video, audio, and 3D data validate its robust performance and adaptability in diverse distillation scenarios.

Overview

The proposed framework, Distilling Dataset into Neural Field (DDiF), represents an advanced parameterization strategy for dataset distillation. By leveraging neural fields as a compact representation of large-scale datasets, the approach compresses the original dataset into a set of small, synthetic instances while retaining salient information required for training deep networks. The proposed method juxtaposes its expressiveness against previous parameterizations by theoretically and empirically demonstrating a larger feasible function space, particularly when budget constraints are enforced per synthetic instance.

Methodology

DDiF formulates dataset distillation as an optimization problem wherein each synthetic datum is represented by a distinct neural field. The core components of the framework are:

Coordinate Set (C):

The coordinate set establishes a fixed, grid-based input space corresponding to the output domain (e.g., pixels for images, voxels for 3D data). Importantly, the coordinates are predefined by the target resolution and do not entail any additional parameterization, making the method decoder-only.

Synthetic Neural Fields (F₍ψ₎):

Each synthetic instance is encoded via a neural field parameterized by ψ. These fields are instantiated as L-layer neural networks using sine activation functions, which enables them to generate outputs as a continuous function of the input coordinates. The optimization objective minimizes the distillation loss L(T, S), where T represents the original dataset and S is the collection of decoded synthetic instances:

1	min₍ψ₎ L(T, S), where S = { (F₍ψ₎(C), ỹ) }

A warm-up phase involving a simple representation of real data guides the network parameters toward representing distributional characteristics prior to optimizing the full distillation loss.

Theoretical Underpinnings

The theoretical contribution focuses on establishing that the feasibility set—the set of functions representable by the neural field parameterization—directly influences the minimization of the distillation loss. In contrast to previous methods such as FreD, the use of sine activations affords DDiF a comprehensive Fourier basis representation. More specifically:

Larger Feasible Space:

The parameterization induced by sine functions can be equivalently expressed as a sum of cosines with adjustable amplitude, frequency, phase, and offset, ensuring higher expressiveness. The analysis delineates that even with identical budgets (in terms of parameter count per synthetic instance), DDiF leads to a strictly larger functional space.

Formal Propositions and Theorems:

Proposition 1 underscores that a broader feasible space yields a lower optimal distillation loss, while Theorem 1 explicitly compares the expressiveness of DDiF with that of FreD, highlighting its capability to encode richer structural information.

Experimental Validation

DDiF has been extensively evaluated across multiple modalities including images, video, audio, and 3D voxels. Key aspects of the experimental results include:

Benchmark Comparisons:

On datasets such as ImageNet-Subset, DDiF demonstrates superior performance over a host of previously established methods (Vanilla, IDC, FreD, HaBa, SPEED, LatentDD, GLaD, H-GLaD) at both 128×128 and 256×256 resolutions, even at Instances Per Class (IPC) as low as 1 and 10.

Cross-Architecture Generalization:

The method achieves robust generalization when synthetic datasets distilled via DDiF are employed across differing network architectures (e.g., AlexNet, VGG11, ResNet18, ViT).

Universality Across Distillation Losses:

DDiF is shown to be method-agnostic by achieving improvements irrespective of using gradient matching (DC) or distribution matching (DM) loss formulations.

Robustness Across Resolutions:

The continuous nature of the neural field ensures that instances can be sampled at arbitrary resolutions. This cross-resolution generalization was highlighted as a distinctive capability in comparative studies, providing a systematic advantage over interpolation-dependent methods.

Significance of Neural Fields in Dataset Distillation

The utilization of neural fields in DDiF introduces several technical merits:

Coding Efficiency:

Neural fields decouple the latent representation size from the dimensionality of the output data, thus affording significant compression capabilities without compromising the informational fidelity necessary for downstream training.

Resolution Adaptability:

Given that the output is a continuous function over the coordinate set, the same distilled representation can generate instances at multiple resolutions. This adaptability is crucial for practical applications where data might be consumed under varying spatial scales.

Enhanced Expressiveness:

The parameterization via sine activations imparts a richer expressivity compared to static, input-sized representations. The resulting larger feasible space allows for encoding more detailed structure, which is directly correlated with improved performance on downstream tasks.

Data Compression and Expressiveness Tradeoffs

A salient aspect of DDiF is the trade-off between compression and expressiveness. While traditional dataset distillation methods optimize synthetic instances at an input-level, DDiF’s neural field-based method encapsulates complex dependencies using a parameter count that is independent of the grid size. This feature is especially beneficial in high-dimensional problems such as video or 3D voxel data. Despite the increased computational load during the forward pass of the neural field, the overall storage and memory overhead is significantly reduced since only the neural network parameters are required. This inherent trade-off is balanced by efficient neural architectures that are well-suited for GPU-accelerated computation.

Implementation Considerations

For practical implementation of DDiF, consider the following aspects:

Network Architecture:

A relatively shallow L-layer network with sine activations is employed. Custom initialization techniques tailored for periodic activation functions should be adopted to ensure stable training dynamics.

Optimization Strategies:

Standard optimization algorithms (SGD, Adam) are appropriate for the training of neural fields. However, careful scheduling of the warm-up phase prior to the distillation loss optimization is critical to allow the neural fields to converge toward useful initial representations.

Computational Resources:

Despite the reduction in data storage requirements, the computational complexity per forward pass is higher than direct input-based methods. Deployment on modern GPU clusters or specialized accelerators is recommended to manage training time effectively.

Hyperparameter Tuning:

Fine-tuning hyperparameters such as the width (dimension d) of the neural field reveals a pronounced impact on performance. Ablation studies suggest that exponential increases in d yield diminishing returns beyond a specific threshold, highlighting the importance of hyperparameter search.

Cross-Modal Extensions:

The method’s adaptability is evidenced by its successful applications in non-image domains (audio and 3D data), making it a versatile framework for experiments in multi-modal dataset distillation.

Conclusion

In summary, DDiF presents a rigorously formulated, highly expressive method for dataset distillation by embedding high-dimensional data into synthetic neural fields. The approach capitalizes on the efficiency of coordinate-based representations to achieve both significant compression and robust performance across varying data modalities. Owing to theoretical guarantees concerning its expressiveness and extensive empirical validation, DDiF offers a compelling alternative to existing dataset distillation strategies, particularly in scenarios demanding high fidelity under strict resource constraints.

Markdown Report Issue