BeamCKMDiff: Beam-Aware CKM Generation

Updated 22 January 2026

BeamCKMDiff is a generative framework that leverages a diffusion process in a VAE latent space to synthesize continuous, beam-aware channel knowledge maps.
It employs an advanced Diffusion Transformer backbone with adaptive layer normalization to integrate beam and environmental context into the generative process.
The method achieves state-of-the-art NMSE performance with sub-second inference, advancing environment-aware channel mapping for scalable 6G network planning.

BeamCKMDiff is a generative framework for constructing high-fidelity, beam-aware channel knowledge maps (CKMs) from environmental context and continuous beamforming vectors in wireless communication scenarios. It is designed to address the limitations of conventional CKM construction methods, which typically rely on sparse sampling measurements, omnidirectional map assumptions, or discrete codebook representations. BeamCKMDiff enables the generation of channel knowledge maps conditioned on arbitrary continuous beamforming vectors without requiring site-specific measurement data, which is critical for the realization of environment-aware 6G networks (Zhao et al., 15 Jan 2026).

1. Conditional Diffusion Process in VAE-Latent Space

BeamCKMDiff synthesizes channel knowledge maps by conditioning a powerful diffusion generative process in the latent space of a variational autoencoder (VAE). The target variable is the VAE latent $z$ , which encodes a normalized map of site-specific radio signal strengths for a given beam.

Forward (Noising) Process

The forward process $q(z_t|z_0)$ progressively adds Gaussian noise to the clean latent $z_0$ through time steps $t=1$ to $T$ :

$q(z_t|z_{t-1}) = \mathcal{N}\left(z_t; \sqrt{1-\beta_t} z_{t-1}, \beta_t I\right)$

$q(z_t|z_0) = \mathcal{N}\left(z_t; \sqrt{\bar{\alpha}_t} z_0, (1-\bar{\alpha}_t) I \right)$

where $\bar{\alpha}_t = \prod_{s=1}^t (1-\beta_s)$ .

Reverse Process and Score Network

The reverse (denoising) process is parameterized by a neural network $\epsilon_\theta$ that predicts the noise at each step, conditioned on environment and beamforming vector:

$p_\theta(z_{t-1}|z_t, c) = \mathcal{N}\left( z_{t-1}; \mu_\theta(z_t, t, c), \Sigma_t I \right)$

with mean

$\mu_\theta(z_t, t, c) = \frac{1}{\sqrt{\alpha_t}}\left(z_t - \frac{1-\alpha_t}{\sqrt{1-\bar{\alpha}_t}} \epsilon_\theta(z_t, t, c)\right).$

Training Objective

BeamCKMDiff adopts the denoising-score matching loss typical in diffusion models: $\mathcal{L}_\text{diff} = \mathbb{E}_{t, z_0, \epsilon, c} \left\| \epsilon - \epsilon_\theta(z_t, t, c) \right\|_2^2$ with $z_t$ sampled as

$z_t = \sqrt{\bar{\alpha}_t} z_0 + \sqrt{1-\bar{\alpha}_t} \epsilon.$

2. Diffusion Transformer Backbone and Conditioning Mechanism

BeamCKMDiff utilizes a variant of the Diffusion Transformer (DiT) as its score network, specifically adapted for spatial-conditional map synthesis and continuous beam embedding.

Architecture Overview

Transformer blocks: 12 layers, input length $N=256$ (16 $\times$ 16 grid tokens), embedding dimension $D=512$ .
Attention: 8 heads per block.
Patch Embedding: Input tensor $[z_t; c_\text{env}]\in\mathbb{R}^{40\times32\times32}$ is processed by a Conv2D patch embedder into $\mathbb{R}^{512\times16\times16}$ , then flattened.
Beam Embedding: The continuous beamforming vector $w\in\mathbb{C}^{N_t}$ , with $N_t=16$ , is split into real and imaginary parts (dimension 32), projected by an MLP to $w_\text{emb}\in\mathbb{R}^{512}$ .
Temporal Embedding: Diffusion step $t$ is similarly embedded by an MLP to $t_\text{emb}\in\mathbb{R}^{512}$ .

The fused conditional embedding $c_\text{emb} = t_\text{emb} + w_\text{emb}$ acts as a global control token.

Adaptive Layer Normalization (adaLN)

Conditioning is injected into every DiT block via adaLN, which modulates the scale ( $\gamma$ ) and shift ( $\beta$ ) of standard LayerNorm by an affine transformation of $c_\text{emb}$ :

$[\gamma, \beta] = A_\text{mod}\;\mathrm{SiLU}(c_\text{emb}) + b_\text{mod}$

$\mathrm{adaLN}(f_\text{in}, c_\text{emb}) = (1+\gamma) \odot \mathrm{LN}(f_\text{in}) + \beta.$

All Multi-Head Attention and MLP sublayers in the DiT blocks use adaLN, enabling global steering of the generative process by the beamforming condition.

3. Data Representations and Preprocessing

BeamCKMDiff operates on spatial, beam, and environmental representations.

Ground-truth CKM $\Psi_w(x,y)$ : $256\times256$ real-valued map (in dB), computed as

$\Psi_w(x,y) = 10\log_{10} \left| h^H(x,y)\,w \right|^2.$

CKMs are normalized by a VAE's sigmoid output and encoded into $z_0\in\mathbb{R}^{8\times32\times32}$ .

Environment context: Building-height maps $B\in\mathbb{R}^{256\times256}$ and transmitter masks $T\in\{0,1\}^{256\times256}$ , stacked and processed by a ResNet encoder to yield $c_\text{env}\in\mathbb{R}^{32\times32\times32}$ .
Tokenization: The spatial tensor $[z_t; c_\text{env}]\in\mathbb{R}^{40\times32\times32}$ is patch embedded and flattened to a sequence of 256 tokens for Transformer input.

4. Training Protocol and Optimization

Dataset Generation

30 geo-referenced urban scenes (OpenStreetMap), each $512\,\text{m}\times512\,\text{m}$ at $2\,\text{m}$ pixel resolution.
For each scene: 10 random GBS locations ( $16\,\times\,1$ ULA, $1.5\,\text{m}$ height), each considered under 10 random continuous beamforming vectors.
Ground-truth maps computed using NVIDIA Sionna ray tracing with $10^9$ rays, up to 3 reflections/diffractions, at $2.4\,\text{GHz}$ .

VAE Pretraining

Encoder $q(z|\Psi_w)$ and decoder $D(z)$ trained to minimize

$\mathcal{L}_{\rm VAE} = \left\| \Psi_w - D(z) \right\|_2^2 + \lambda_\text{KL} D_\text{KL}\left(q(z|\Psi_w) \,\|\, \mathcal{N}(0,I)\right).$

Diffusion Model Training

$T=500$ noise steps, $\beta_1=4\times10^{-5}$ , $\beta_T=5\times10^{-3}$ (linear schedule).
Frozen VAE; jointly optimize condition encoder and DiT denoiser using Adam optimizer ( $1\times10^{-4}$ learning rate) for several hundred epochs.

5. Evaluation, Baselines, and Quantitative Results

BeamCKMDiff is benchmarked against state-of-the-art approaches for CKM construction:

RadioUNet: deterministic U-Net CKM regressor
TransUNet: Transformer backbone with discrete beam-index embedding
RadioDiff-UNet: U-Net diffusion model without adaptive normalization

Two evaluation protocols are used:

Unseen beams: test on new beamforming vectors at previously seen GBS locations.
Unseen locations: test on new transmitter locations and new beams.

The main metric is NMSE (dB), computed as

$\mathrm{NMSE(dB)} = 10\log_{10}\left( \frac{ \sum_{q\in\mathcal{D}\setminus\mathcal{B}} \left|\widehat{\Psi}(q)-\Psi(q)\right|^2 }{ \sum_{q\in\mathcal{D}\setminus\mathcal{B}} |\Psi(q)|^2 } \right)$

where $\widehat{\Psi}$ is the predicted CKM, $\Psi$ the ground truth, $\mathcal{B}$ the building mask.

Method	Unseen beams	Unseen locations	Inference time (s)
RadioUNet	−16.35 dB	−16.34 dB	0.231
TransUNet	−18.91 dB	−17.34 dB	0.234
RadioDiff-UNet	−19.49 dB	−19.13 dB	0.373
BeamCKMDiff	−21.24 dB	−20.68 dB	0.688

BeamCKMDiff achieves the lowest NMSE (highest accuracy) across both settings. Visual inspection confirms that it accurately reconstructs both main-lobe and side-lobe structures under arbitrary continuous beam queries, outperforming baselines that either blur or misalign these salient features (Zhao et al., 15 Jan 2026).

6. Architectural and Methodological Significance

BeamCKMDiff introduces architectural and methodological advances:

Continuous beam generalization: Rather than restricting to a pre-defined codebook, BeamCKMDiff is conditioned on arbitrary beamforming vectors, increasing flexibility for real-world deployments.
adaLN-based global control: The adaptive layer normalization mechanism allows fine-grained, block-level modulation of the Transformer’s activations by the beam embedding, which is critical for capturing the non-trivial coupling between directionality and site-specific channel propagation.
Diffusion process in learned latent space: Operating in a VAE-derived latent space both regularizes the generative process and enables use of compact, information-dense features.
Sample and runtime efficiency: BeamCKMDiff achieves sub-second inference time for full map prediction, and, due to its generative efficiency, can synthesize CKMs for new beams or sites with no additional sampling.

A plausible implication is that these methodological elements are necessary for reliable beam-conditioned map generation at high fidelity and scale.

7. Context, Applications, and Integration

BeamCKMDiff represents a foundational tool for 6G network planning, environment-aware beam management, and site-specific channel database construction. It is directly applicable to settings where dense measurement-driven CKMs are infeasible to acquire, or when rapid adaptation to new beamforming vectors is required. By explicitly incorporating the environmental context and continuous beam control, BeamCKMDiff enables more granular, flexible, and accurate channel state information, which is critical for optimizing spatial reuse, beam selection, and CSI feedback compression in future wireless networks (Zhao et al., 15 Jan 2026).

In related simulation environments such as those for precision interferometry and diffraction (e.g., as discussed for beam decomposition and propagation in (Zhao et al., 2022)), analogous principles of combining continuous parameter control with generative map construction may inform broader diffraction toolkits. However, BeamCKMDiff is distinct in its focus on environmental radio propagation conditioned by arbitrary continuous beam directions, achieved via modern generative diffusion modeling.

Markdown Report Issue Upgrade to Chat

References (2)

BeamCKMDiff: Beam-Aware Channel Knowledge Map Construction via Diffusion Transformer (2026)

Method comparison for simulating non-Gaussian Beams and Diffraction for Precision Interferometry (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to BeamCKMDiff.