SketchDNN: Joint Continuous-Discrete Diffusion for CAD Sketch Generation

Published 15 Jul 2025 in cs.CV and cs.LG | (2507.11579v2)

Abstract: We present SketchDNN, a generative model for synthesizing CAD sketches that jointly models both continuous parameters and discrete class labels through a unified continuous-discrete diffusion process. Our core innovation is Gaussian-Softmax diffusion, where logits perturbed with Gaussian noise are projected onto the probability simplex via a softmax transformation, facilitating blended class labels for discrete variables. This formulation addresses 2 key challenges, namely, the heterogeneity of primitive parameterizations and the permutation invariance of primitives in CAD sketches. Our approach significantly improves generation quality, reducing Fr\'echet Inception Distance (FID) from 16.04 to 7.80 and negative log-likelihood (NLL) from 84.8 to 81.33, establishing a new state-of-the-art in CAD sketch generation on the SketchGraphs dataset.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a novel joint continuous-discrete diffusion framework using Gaussian-Softmax diffusion to enhance the fidelity and diversity of CAD sketch generation.
It employs a permutation-equivariant transformer architecture that effectively addresses heterogeneous primitive parameterization and permutation invariance.
The approach achieves state-of-the-art results on the SketchGraphs dataset by significantly reducing FID and Negative Log-Likelihood compared to previous methods.

SketchDNN: Joint Continuous-Discrete Diffusion for CAD Sketch Generation

The paper "SketchDNN: Joint Continuous-Discrete Diffusion for CAD Sketch Generation" introduces a novel approach for generating computer-aided design (CAD) sketches using a unified continuous-discrete diffusion process. This model, termed SketchDNN, improves upon prior methods by addressing intrinsic challenges in CAD sketch generation through a Gaussian-Softmax diffusion mechanism, which allows for a blend of class label probabilities, enhancing both the fidelity and diversity of generated sketches on the SketchGraphs dataset.

Figure 1: The generation pipeline of SketchDNN. Starting from a pure noise seed $\mathcal{X}_T$ , the denoiser network iteratively refines the sample, leading to the final generated sketch $\mathcal{X}_0$ through successive denoising steps.

Methodology

SketchDNN is based on a diffusion model where both continuous parameters and discrete labels of CAD primitives are modeled jointly. The process employs Gaussian-Softmax diffusion, which perturbs logits with Gaussian noise and maps them onto a probability simplex through a softmax transformation. This facilitates a seamless handling of discrete variables like class labels of primitives, overcoming two key challenges in CAD sketching: heterogeneity in primitive parameterizations and permutation invariance.

Continuous and Discrete Diffusion

Continuous diffusion uses a Markov chain to add Gaussian noise over timesteps, progressively destroying information about input data until it resembles pure noise. Discrete diffusion, however, traditionally struggled with such a gradual transition due to its inherent categorical constraints. SketchDNN's discrete diffusion uses the Gaussian-Softmax distribution, providing a continuous relaxation by transforming Gaussian vectors with softmax, enabling blended class labels.

Figure 2: Left: The orange curve represents the raw cosine variance schedule $\overline{a_t}$ , while the blue curve depicts the probability that the class label remains unchanged.

CAD Sketch Representation and Permutation Invariance

Each primitive in a CAD sketch is characterized by discrete and continuous attributes. SketchDNN models each attribute independently within the diffusion framework. The model employs permutation-equivariant diffusion, wherein the denoising process is independent of primitive order in sketches, thus conserving geometric representation despite varied primitive sequences. This is crucial for CAD sketches where there are many permutations of primitives that are geometrically identical.

Implementation and Training

The denoiser network within SketchDNN leverages a permutation-equivariant transformer architecture without positional encodings, preserving equivalence across permutations. SketchDNN was trained over a significant number of epochs using a large batch size distributed across multiple GPUs, focusing on minimizing reconstruction loss via Mean-Squared Error for continuous variables and Cross-Entropy for discrete variables. An important aspect of training included masking irrelevant primitive parameters based on predicted class probabilities, ensuring higher fidelity predictions.

Results and Evaluation

SketchDNN significantly reduced the Fréchet Inception Distance (FID) and Negative Log-Likelihood (NLL) compared to existing benchmarks such as Vitruvion and SketchGen, establishing new state-of-the-art performance. The model excelled in capturing both fidelity and diversity in generated sketches, demonstrating its superior capability in CAD sketch generation tasks.

Figure 3: Comparison of CAD sketches from the SketchGraphs dataset (top), generations from SketchDNN (middle), and generations from Vitruvion (bottom).

Conclusion

SketchDNN introduces a robust, innovative framework for generating CAD sketches by integrating continuous and discrete diffusion processes. Its Gaussian-Softmax diffusion paradigm is a significant step forward in overcoming traditional challenges associated with discrete variable diffusion, promising broader applicability to other generative tasks involving mixed data types. Future work can explore its extension to conditional generation tasks and further optimization for a wider range of CAD applications.

Markdown Report Issue