Papers
Topics
Authors
Recent
Search
2000 character limit reached

BeamCKMDiff: Beam-Aware CKM Generation

Updated 22 January 2026
  • BeamCKMDiff is a generative framework that leverages a diffusion process in a VAE latent space to synthesize continuous, beam-aware channel knowledge maps.
  • It employs an advanced Diffusion Transformer backbone with adaptive layer normalization to integrate beam and environmental context into the generative process.
  • The method achieves state-of-the-art NMSE performance with sub-second inference, advancing environment-aware channel mapping for scalable 6G network planning.

BeamCKMDiff is a generative framework for constructing high-fidelity, beam-aware channel knowledge maps (CKMs) from environmental context and continuous beamforming vectors in wireless communication scenarios. It is designed to address the limitations of conventional CKM construction methods, which typically rely on sparse sampling measurements, omnidirectional map assumptions, or discrete codebook representations. BeamCKMDiff enables the generation of channel knowledge maps conditioned on arbitrary continuous beamforming vectors without requiring site-specific measurement data, which is critical for the realization of environment-aware 6G networks (Zhao et al., 15 Jan 2026).

1. Conditional Diffusion Process in VAE-Latent Space

BeamCKMDiff synthesizes channel knowledge maps by conditioning a powerful diffusion generative process in the latent space of a variational autoencoder (VAE). The target variable is the VAE latent zz, which encodes a normalized map of site-specific radio signal strengths for a given beam.

Forward (Noising) Process

The forward process q(ztz0)q(z_t|z_0) progressively adds Gaussian noise to the clean latent z0z_0 through time steps t=1t=1 to TT:

q(ztzt1)=N(zt;1βtzt1,βtI)q(z_t|z_{t-1}) = \mathcal{N}\left(z_t; \sqrt{1-\beta_t} z_{t-1}, \beta_t I\right)

q(ztz0)=N(zt;αˉtz0,(1αˉt)I)q(z_t|z_0) = \mathcal{N}\left(z_t; \sqrt{\bar{\alpha}_t} z_0, (1-\bar{\alpha}_t) I \right)

where αˉt=s=1t(1βs)\bar{\alpha}_t = \prod_{s=1}^t (1-\beta_s).

Reverse Process and Score Network

The reverse (denoising) process is parameterized by a neural network ϵθ\epsilon_\theta that predicts the noise at each step, conditioned on environment and beamforming vector:

pθ(zt1zt,c)=N(zt1;μθ(zt,t,c),ΣtI)p_\theta(z_{t-1}|z_t, c) = \mathcal{N}\left( z_{t-1}; \mu_\theta(z_t, t, c), \Sigma_t I \right)

with mean

μθ(zt,t,c)=1αt(zt1αt1αˉtϵθ(zt,t,c)).\mu_\theta(z_t, t, c) = \frac{1}{\sqrt{\alpha_t}}\left(z_t - \frac{1-\alpha_t}{\sqrt{1-\bar{\alpha}_t}} \epsilon_\theta(z_t, t, c)\right).

Training Objective

BeamCKMDiff adopts the denoising-score matching loss typical in diffusion models: Ldiff=Et,z0,ϵ,cϵϵθ(zt,t,c)22\mathcal{L}_\text{diff} = \mathbb{E}_{t, z_0, \epsilon, c} \left\| \epsilon - \epsilon_\theta(z_t, t, c) \right\|_2^2 with ztz_t sampled as

zt=αˉtz0+1αˉtϵ.z_t = \sqrt{\bar{\alpha}_t} z_0 + \sqrt{1-\bar{\alpha}_t} \epsilon.

2. Diffusion Transformer Backbone and Conditioning Mechanism

BeamCKMDiff utilizes a variant of the Diffusion Transformer (DiT) as its score network, specifically adapted for spatial-conditional map synthesis and continuous beam embedding.

Architecture Overview

  • Transformer blocks: 12 layers, input length N=256N=256 (16×\times16 grid tokens), embedding dimension D=512D=512.
  • Attention: 8 heads per block.
  • Patch Embedding: Input tensor [zt;cenv]R40×32×32[z_t; c_\text{env}]\in\mathbb{R}^{40\times32\times32} is processed by a Conv2D patch embedder into R512×16×16\mathbb{R}^{512\times16\times16}, then flattened.
  • Beam Embedding: The continuous beamforming vector wCNtw\in\mathbb{C}^{N_t}, with Nt=16N_t=16, is split into real and imaginary parts (dimension 32), projected by an MLP to wembR512w_\text{emb}\in\mathbb{R}^{512}.
  • Temporal Embedding: Diffusion step tt is similarly embedded by an MLP to tembR512t_\text{emb}\in\mathbb{R}^{512}.

The fused conditional embedding cemb=temb+wembc_\text{emb} = t_\text{emb} + w_\text{emb} acts as a global control token.

Adaptive Layer Normalization (adaLN)

Conditioning is injected into every DiT block via adaLN, which modulates the scale (γ\gamma) and shift (β\beta) of standard LayerNorm by an affine transformation of cembc_\text{emb}:

[γ,β]=Amod  SiLU(cemb)+bmod[\gamma, \beta] = A_\text{mod}\;\mathrm{SiLU}(c_\text{emb}) + b_\text{mod}

adaLN(fin,cemb)=(1+γ)LN(fin)+β.\mathrm{adaLN}(f_\text{in}, c_\text{emb}) = (1+\gamma) \odot \mathrm{LN}(f_\text{in}) + \beta.

All Multi-Head Attention and MLP sublayers in the DiT blocks use adaLN, enabling global steering of the generative process by the beamforming condition.

3. Data Representations and Preprocessing

BeamCKMDiff operates on spatial, beam, and environmental representations.

  • Ground-truth CKM Ψw(x,y)\Psi_w(x,y): 256×256256\times256 real-valued map (in dB), computed as

Ψw(x,y)=10log10hH(x,y)w2.\Psi_w(x,y) = 10\log_{10} \left| h^H(x,y)\,w \right|^2.

CKMs are normalized by a VAE's sigmoid output and encoded into z0R8×32×32z_0\in\mathbb{R}^{8\times32\times32}.

  • Environment context: Building-height maps BR256×256B\in\mathbb{R}^{256\times256} and transmitter masks T{0,1}256×256T\in\{0,1\}^{256\times256}, stacked and processed by a ResNet encoder to yield cenvR32×32×32c_\text{env}\in\mathbb{R}^{32\times32\times32}.
  • Tokenization: The spatial tensor [zt;cenv]R40×32×32[z_t; c_\text{env}]\in\mathbb{R}^{40\times32\times32} is patch embedded and flattened to a sequence of 256 tokens for Transformer input.

4. Training Protocol and Optimization

Dataset Generation

  • 30 geo-referenced urban scenes (OpenStreetMap), each 512m×512m512\,\text{m}\times512\,\text{m} at 2m2\,\text{m} pixel resolution.
  • For each scene: 10 random GBS locations (16×116\,\times\,1 ULA, 1.5m1.5\,\text{m} height), each considered under 10 random continuous beamforming vectors.
  • Ground-truth maps computed using NVIDIA Sionna ray tracing with 10910^9 rays, up to 3 reflections/diffractions, at 2.4GHz2.4\,\text{GHz}.

VAE Pretraining

Encoder q(zΨw)q(z|\Psi_w) and decoder D(z)D(z) trained to minimize

LVAE=ΨwD(z)22+λKLDKL(q(zΨw)N(0,I)).\mathcal{L}_{\rm VAE} = \left\| \Psi_w - D(z) \right\|_2^2 + \lambda_\text{KL} D_\text{KL}\left(q(z|\Psi_w) \,\|\, \mathcal{N}(0,I)\right).

Diffusion Model Training

  • T=500T=500 noise steps, β1=4×105\beta_1=4\times10^{-5}, βT=5×103\beta_T=5\times10^{-3} (linear schedule).
  • Frozen VAE; jointly optimize condition encoder and DiT denoiser using Adam optimizer (1×1041\times10^{-4} learning rate) for several hundred epochs.

5. Evaluation, Baselines, and Quantitative Results

BeamCKMDiff is benchmarked against state-of-the-art approaches for CKM construction:

  • RadioUNet: deterministic U-Net CKM regressor
  • TransUNet: Transformer backbone with discrete beam-index embedding
  • RadioDiff-UNet: U-Net diffusion model without adaptive normalization

Two evaluation protocols are used:

  1. Unseen beams: test on new beamforming vectors at previously seen GBS locations.
  2. Unseen locations: test on new transmitter locations and new beams.

The main metric is NMSE (dB), computed as

NMSE(dB)=10log10(qDBΨ^(q)Ψ(q)2qDBΨ(q)2)\mathrm{NMSE(dB)} = 10\log_{10}\left( \frac{ \sum_{q\in\mathcal{D}\setminus\mathcal{B}} \left|\widehat{\Psi}(q)-\Psi(q)\right|^2 }{ \sum_{q\in\mathcal{D}\setminus\mathcal{B}} |\Psi(q)|^2 } \right)

where Ψ^\widehat{\Psi} is the predicted CKM, Ψ\Psi the ground truth, B\mathcal{B} the building mask.

Method Unseen beams Unseen locations Inference time (s)
RadioUNet −16.35 dB −16.34 dB 0.231
TransUNet −18.91 dB −17.34 dB 0.234
RadioDiff-UNet −19.49 dB −19.13 dB 0.373
BeamCKMDiff −21.24 dB −20.68 dB 0.688

BeamCKMDiff achieves the lowest NMSE (highest accuracy) across both settings. Visual inspection confirms that it accurately reconstructs both main-lobe and side-lobe structures under arbitrary continuous beam queries, outperforming baselines that either blur or misalign these salient features (Zhao et al., 15 Jan 2026).

6. Architectural and Methodological Significance

BeamCKMDiff introduces architectural and methodological advances:

  • Continuous beam generalization: Rather than restricting to a pre-defined codebook, BeamCKMDiff is conditioned on arbitrary beamforming vectors, increasing flexibility for real-world deployments.
  • adaLN-based global control: The adaptive layer normalization mechanism allows fine-grained, block-level modulation of the Transformer’s activations by the beam embedding, which is critical for capturing the non-trivial coupling between directionality and site-specific channel propagation.
  • Diffusion process in learned latent space: Operating in a VAE-derived latent space both regularizes the generative process and enables use of compact, information-dense features.
  • Sample and runtime efficiency: BeamCKMDiff achieves sub-second inference time for full map prediction, and, due to its generative efficiency, can synthesize CKMs for new beams or sites with no additional sampling.

A plausible implication is that these methodological elements are necessary for reliable beam-conditioned map generation at high fidelity and scale.

7. Context, Applications, and Integration

BeamCKMDiff represents a foundational tool for 6G network planning, environment-aware beam management, and site-specific channel database construction. It is directly applicable to settings where dense measurement-driven CKMs are infeasible to acquire, or when rapid adaptation to new beamforming vectors is required. By explicitly incorporating the environmental context and continuous beam control, BeamCKMDiff enables more granular, flexible, and accurate channel state information, which is critical for optimizing spatial reuse, beam selection, and CSI feedback compression in future wireless networks (Zhao et al., 15 Jan 2026).

In related simulation environments such as those for precision interferometry and diffraction (e.g., as discussed for beam decomposition and propagation in (Zhao et al., 2022)), analogous principles of combining continuous parameter control with generative map construction may inform broader diffraction toolkits. However, BeamCKMDiff is distinct in its focus on environmental radio propagation conditioned by arbitrary continuous beam directions, achieved via modern generative diffusion modeling.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to BeamCKMDiff.