RADR: Relation-Aware Design Reconstruction

Updated 8 February 2026

The paper introduces RADR, a framework that leverages relation graphs and multi-modal models to achieve structure-preserving design layout editing.
It formulates layout modifications as a self-supervised reconstruction problem using serialized edge sequences and standardized operations.
The approach demonstrates superior accuracy and efficiency by integrating geometric relation awareness into design editing compared to existing methods.

Relation-Aware Design Reconstruction (RADR) is a framework for autonomous design layout editing that achieves structure-preserving modifications in graphical designs. Conceived as the core architectural element of ReLayout, RADR addresses the challenges of ambiguous natural language instructions and limited annotation data, formulating layout editing as a self-supervised reconstruction problem informed by explicit element relations and standardized editing operations. The approach unifies multiple editing actions (add, delete, move, resize) within a multi-modal LLM (MLLM) backbone, enabling versatile, data-efficient, and accurate design editing while robustly maintaining the geometric structure of unedited regions (Lin et al., 1 Feb 2026).

1. Formalization of Design Elements and Relation Graph

In RADR, a design $D$ is a set of $N$ elements $D = \{E_i\}_{i=1}^N$ , where each $E_i$ is characterized by its content $C_i$ (image or text) and geometric attributes $A_i$ . For images, $A_i = (x_i, y_i, w_i, h_i)$ specifies position and size; for text $A_i = (x_i, y_i, w_i, h_i, \phi_i, \alpha_i, r_i)$ , with $\phi_i$ as font size, $\alpha_i$ as angle, and $r_i$ as alignment.

RADR introduces a relation graph $G = (V, R)$ encoding layout structure. Nodes $V = \{E_0\} \cup \{E_i\}_{i=1}^N$ comprise design elements and the canvas node $E_0$ . Directed edges $R$ describe pairwise relations, partitioned into size $R^{\text{size}}$ and position $R^{\text{pos}}$ :

Size relations: Area ratio $\mathrm{AR}_{ij} = \frac{w_i h_i}{w_j h_j}$ between elements $E_i, E_j$ is classified as “small”, “equal”, or “large” with tolerance $\alpha$ :

$R_{ij}^{\text{size}} = \begin{cases} \text{small} & \mathrm{AR}_{ij} < 1-\alpha \ \text{equal} & 1-\alpha \leq \mathrm{AR}_{ij} \leq 1+\alpha \ \text{large} & \mathrm{AR}_{ij} > 1+\alpha \end{cases}$

No size edges involve the canvas.

Position relations: Each reference bounding box defines a $3\times 3$ grid; relations are {TL, T, TR, L, C, R, BL, B, BR} according to source center location relative to the target.

In practice, rather than dense adjacency tensors, RADR serializes relations into edge sequences for input.

2. Self-Supervised Reconstruction Objective

The training process emulates layout editing as a conditional attribute reconstruction problem. The model $f_\theta: (C, G, O) \mapsto \hat{A}$ predicts edited attributes $\hat{A}$ , given design contents $C$ , the (possibly pruned) relation graph $G$ , and a synthesized editing operation $O$ . The objective is the negative log-likelihood (NLL):

$\mathcal{L}_{\text{rec}} = -\mathbb{E}_{D,O}\left[ \log P_\theta (A \mid C, G, O) \right] = -\sum_{t=1}^{T} \log P_\theta (a_t \mid a_{<t}, C, G, O)$

where $a_t$ are attribute tokens in an autoregressive factorization.

Weight-decay regularization $\lambda \|\theta\|_2^2$ is applied to LoRA-adapted LLM parameters. To synthesize supervision, a random operation $O$ is sampled and edges related to the target element are removed from $G$ , compelling the model to infer the new attributes from remaining structure and $O$ . This process bypasses the need for explicit (original, operation, edited) triplets.

3. Standardized Editing Operations and Data Synthesis

Every operation is represented in tuple format: $O = (\mathrm{action}, i_{\rm target}, \mathrm{params})$ , with $\mathrm{action} \in \{\text{add}, \text{delete}, \text{move}, \text{resize}\}$ . Actions are defined as:

Action	Target	Parameters
add	index in validation	none
delete	index in training	none
move	index in training	new $(x, y)$
resize	index in training	new $(w, h)$

For self-supervised samples, a design $D$ is selected, $G$ is built, operation $O$ is sampled, affected edges are removed, and the model learns from $(C, G', O) \mapsto A$ .

RADR leverages an MLLM composed of:

Vision encoder: $\phi_\mathrm{vis}$ (e.g., CLIP ViT-L/14, frozen), producing visual tokens per image element.
Projector: A 2-layer MLP with GELU maps vision embeddings to LLM token space.
LLM backbone: (e.g., Llama-3.1-8B), which receives all tokenized inputs and autoregressively emits attribute predictions as JSON.

Inputs are concatenated from: projected image tokens, text tokens, serialized relation tokens, and operation tokens (e.g., “MOVE element 3 TO (120, 450)”). This deck of tokens is processed with positional embeddings. The output is parsed to extract new attributes for rendering.

5. Training and Inference Process

5.1 Self-Supervised Fine-Tuning

Dataset: Crello v4, approximately 23k designs, filtered to exclude designs with over 25 elements.
Sampling: For each design $D$ , build $G$ , synthesize $O$ , remove edges for the operation target, yielding the input for one training sample.
Optimization: Backbone Llama-3.1-8B (LoRA-adapted, AdamW optimizer, lr = $2$e-$4$, weight decay $0.01$); frozen vision encoder; LoRA rank 32; batch size 64; trained on 8 GPUs for approximately 50k steps.

5.2 Inference Flow

Given a new input $(D_{\rm in}, O)$ :

Extract current element content $C$ and attributes $A_{\rm in}$ .
Build relation graph $G$ (removing affected edges).
Tokenize $(C, G, O)$ .
Forward pass through the fine-tuned MLLM.
Parse JSON to obtain $A_{\rm out}$ .
Render final design from predicted attributes.

6. Experimental Results and Comparative Analysis

RADR was benchmarked against GPT-4o (multi-modal assistant), FlexDM (masked-field layout generation), LaDeCo (layered design composer), and PosterLLaVA (relation-conditioned layout). Key evaluation metrics include design/layout, content, typography/color, graphics/images, innovation, overlap (Ove, lower better), alignment (Ali, lower better), size relation preservation, position relation preservation, and operation accuracy.

Model	Design/Layout–Innovation	Ove	Ali	Size-Rel Pres.	Pos-Rel Pres.	Op-Acc
RADR (Ours)	8.25–7.10	0.0996	0.0013	0.9150	0.8684	0.9991
GPT-4o	7.41–6.37	0.1942	0.0011	0.8386	0.5544	0.9983
FlexDM	5.34–4.54	0.3242	0.0016	–	–	–
LaDeCo	8.08–6.98	0.0865	0.0013	–	–	–
PosterLLaVA	–	–	–	0.8822	0.8458	–

In the generalization setting, RADR further improves size and position relation preservation (0.9475 and 0.9157, respectively), outperforming all baselines, including GPT-4o (0.8447 and 0.5444). Human preference surveys (200 samples) indicate significantly higher preference for RADR output over GPT-4o, particularly regarding visual quality and structural preservation (up to 86.0% preferred structure retention in both reconstruction and generalization).

7. Ablation Studies and Structure Preservation

Ablation studies demonstrate critical contributions of RADR and the explicit relation graph. Removing RADR or the relation graph from the model substantially degrades size and position relation preservation (to $\sim0.80$ and $\sim0.37$ ), in contrast to the full model (0.9150 and 0.8684). Using a dense adjacency matrix is less effective than a serialized edge sequence.

Setting	Design/Layout–Innovation	Ove	Ali	Size Rel	Pos Rel	Op
w/o RADR	8.12–7.01	0.1007	0.0011	0.8008	0.3750	0.9978
w/o relation graph	8.21–7.07	0.0980	0.0013	0.7943	0.3751	0.9987
w/ matrix	8.17–7.04	0.0954	0.0012	0.8892	0.8529	0.9909
Ours (serialized)	8.25–7.10	0.0996	0.0013	0.9150	0.8684	0.9991

These results confirm the necessity of both relation awareness and the reconstruction formulation for reliable structure preservation in automatic design layout editing (Lin et al., 1 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (1)

ReLayout: Versatile and Structure-Preserving Design Layout Editing via Relation-Aware Design Reconstruction (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Relation-Aware Design Reconstruction (RADR).