Inverse Design of Optical Multilayer Thin Films using Robust Masked Diffusion Models

Published 1 Apr 2026 in physics.optics and cs.LG | (2604.01106v1)

Abstract: Inverse design of optical multilayer stacks seeks to infer layer materials, thicknesses, and ordering from a desired target spectrum. It is a long-standing challenge due to the large design space and non-unique solutions. We introduce \texttt{OptoLlama}, a masked diffusion LLM for inverse thin-film design from optical spectra. Representing multilayer stacks as sequences of material-thickness tokens, \texttt{OptoLlama} conditions generation on reflectance, absorptance, and transmittance spectra and learns a probabilistic mapping from optical response to structure. Evaluated on a representative test set of 3,000 targets, \texttt{OptoLlama} reduces the mean absolute spectral error by 2.9-fold relative to a nearest-neighbor template baseline and by 3.45-fold relative to the state-of-the-art data-driven baseline, called \texttt{OptoGPT}. Case studies on designed and expert-defined targets show that the model reproduces characteristic spectral features and recovers physically meaningful stack motifs, including distributed Bragg reflectors. These results establish diffusion-based sequence modeling as a powerful framework for inverse photonic design.

Abstract PDF Upgrade to Chat

Authors (8)

Summary

The paper introduces a masked diffusion approach (OptoLlama) that overcomes limitations of autoregressive models by enabling parallel design of optical thin films.
It leverages spectral conditioning and Monte Carlo sampling to achieve a 2.9-fold reduction in mean absolute error relative to baseline methods.
The framework facilitates robust, diverse, and physically interpretable multilayer configurations for advanced photonic applications.

Inverse Design of Optical Multilayer Thin Films via Masked Diffusion LLMs

Introduction

The design of optical multilayer thin films plays a critical role in a diverse range of photonic applications, from bandpass and bandstop filters to photovoltaic coatings and radiative cooling structures. The inverse design task—inferring optimal material sequences, stacking order, and layer thicknesses from prescribed reflectance, absorptance, and transmittance (RAT) spectra—poses profound challenges due to a vast, highly non-unique design space. Traditional design methodologies, including iterative gradient descent, evolutionary algorithms, and forward simulation-based tuning, are computationally prohibitive and insufficient for systematic exploration of alternative design strategies.

Recent advances in data-driven inverse photonic design leverage machine learning to learn the mapping from optical response to structure. However, dominant autoregressive transformer-based approaches such as OptoGPT exhibit inherent limitations in predictive fidelity, stack representation flexibility, and design diversity, often due to error accumulation and model overconfidence.

This work introduces OptoLlama, a robust masked diffusion LLM framework for the inverse design of optical multilayer stacks, representing sequences of material-thickness tokens and conditioning generation on target optical spectra. This approach establishes masked diffusion transformers as a superior paradigm in this application context, both in predictive performance and in representing physically meaningful, diverse stack architectures (2604.01106).

Methodology

Sequence-Based Representation and Model Design

OptoLlama encodes optical multilayer stacks as sequences of discrete material-thickness tokens—analogous to textual tokens in LLMs. The model is conditioned on target RAT spectra and initiates from a fully masked token sequence. It then progressively resolves each mask to a concrete material-thickness pair through an iterative denoising diffusion process, producing full stack designs in parallel rather than autoregressively.

The core architecture comprises a transformer with alternating self-attention over stack tokens and cross-attention for spectral conditioning. Training involves minimizing token-level categorical cross-entropy between ground-truth and predicted sequences, given the same spectral prompt.

Diffusion Formulation

Masked diffusion enables Monte Carlo sampling at each denoising step, directly modeling the multimodal conditional distribution over permutations of feasible material sequences. This formulation not only increases design diversity and enables explicit uncertainty quantification but also empirically avoids the error accumulation characteristic of autoregressive inverse design.

The framework operates over a vocabulary comprising 19 commonly used materials and discrete thickness increments (10–500 nm, step size 10nm), yielding a combinatorial design space exceeding $10^{59}$ configurations for up to 20 layers. For all simulations, RAT spectra are computed using a transfer-matrix method (TMM) under normal incidence and planar geometry. The model is trained on 10 million simulated stack-spectra pairs with an additional 1 million reserved for testing.

Results

Quantitative Performance

On a dataset of 3,000 representative test spectra, OptoLlama achieves an average mean absolute error (MAE) of 0.014—constituting a 2.9-fold reduction relative to the nearest-neighbor baseline (template MAE = 0.041) and a 3.45-fold reduction relative to the state-of-the-art OptoGPT (MAE = 0.048). Notably, these improvements are not attributable to model size; the OptoLlama architecture has only 2.9% more parameters than the OptoGPT baseline, indicating the fidelity gains stem from the diffusion-based sequence denoising and conditioning methodology rather than simple scaling.

The iterative denoising trajectory reveals characteristic behavior: OptoLlama initially underperforms the template and autoregressive baseline in early steps but consistently converges to lower error in late-stage reconstruction. This progression indicates the criticality of global consistency in predicting the full stack, which masked diffusion can accommodate.

Qualitative Analysis

OptoLlama’s generated stacks preserve signature spectral features (e.g., reflection peaks and stop bands) and frequently assemble physically interpretable motifs. For synthetic and realistic targets—including experimentally constructed MorphoColor-inspired color filters and artificial bandstop filters—the model generates stacks that accurately reproduce the desired optical response. For example, in the color filter case, OptoLlama’s output matches the reflectance peak near 550 nm with high fidelity and lower residuals relative to both the template and OptoGPT.

Analysis of the predicted stacks uncovers frequent emergence of design motifs such as distributed Bragg reflector (DBR)-like sub-sequences, constructed from alternating low- and high-index dielectric layers in a quarter-wave configuration. The model over-represents these motifs in outputs for relevant spectral targets compared to the training data, demonstrating conditional assembly rather than pure memorization.

Monte Carlo sampling demonstrates that OptoLlama explores a nontrivial space of viable stack architectures for each target, with variability clustered around physically interchangeable materials. For example, low-index MgF2 is selected as the entrance layer in >84% of samples for most test targets, followed by other low-index dielectrics, in line with established optical design principles.

Practical Aspects

OptoLlama incorporates explicit constraints (maximum layer depth, allowed materials/positions) via masking, enabling direct translation of fabrication or application requirements into the inverse design pipeline. All code, datasets, and model weights are released to enable reproducibility and further extension.

The energy cost analysis covers training (∼340 kWh per run in California), indicating negligible inference costs on modern hardware. The model therefore offers practical utility for high-throughput or real-time inverse design applications.

Implications and Future Directions

The introduction of probabilistic masked diffusion as a generative modeling approach for inverse optical stack design represents a tangible advance over deterministic or autoregressive architectures. Beyond increased prediction accuracy and solution diversity, the methodology facilitates physical interpretability, uncertainty quantification, and principled Monte Carlo-based winner selection—key for robust deployment in high-value engineering contexts such as solar, display, and sensing systems.

From a theoretical viewpoint, the conditional diffusion process effectively samples from the multimodal inverse map inherent to many real-world inverse problems, allowing for population-based search and identification of novel, nontrivial design motifs not directly present in the training data. The findings further support that the masked diffusion approach can induce nontrivial task-specific priors and compositional logic, evidenced by the assembly of DBR and graded-index motifs in bandstop and broadband targets.

Current limitations, such as the discretized thickness and wavelength representation and the fixed material palette, are fundamentally dataset-dependent and could be addressed by augmenting the token vocabulary at the cost of increasing training set requirements. Extension to nonplanar, polarization-dependent, or fabrication-aware settings is straightforward given suitable data and physics-driven differentiable simulators.

Of particular note is the misalignment between categorical cross-entropy loss and the inherently multimodal solution space of inverse design; improvements in loss function formulation—such as those capturing structural diversity or robust spectral performance—are warranted for future work.

Conclusion

OptoLlama establishes masked diffusion LLMs as an effective and flexible foundation for inverse design in optical multilayer systems. By unifying parallel global sequence generation, spectral conditioning, and physically grounded simulation, the framework yields superior predictive fidelity, robust diversity of solutions, and interpretable stack architectures relative to prior approaches. These advances enable new workflows for photonic component discovery and optimization, and the general methodology suggests broad applicability to other ill-posed inverse problems in materials science and engineering.

Citation: "Inverse Design of Optical Multilayer Thin Films using Robust Masked Diffusion Models" (2604.01106)

Markdown Report Issue