Papers
Topics
Authors
Recent
Search
2000 character limit reached

RADR: Relation-Aware Design Reconstruction

Updated 8 February 2026
  • The paper introduces RADR, a framework that leverages relation graphs and multi-modal models to achieve structure-preserving design layout editing.
  • It formulates layout modifications as a self-supervised reconstruction problem using serialized edge sequences and standardized operations.
  • The approach demonstrates superior accuracy and efficiency by integrating geometric relation awareness into design editing compared to existing methods.

Relation-Aware Design Reconstruction (RADR) is a framework for autonomous design layout editing that achieves structure-preserving modifications in graphical designs. Conceived as the core architectural element of ReLayout, RADR addresses the challenges of ambiguous natural language instructions and limited annotation data, formulating layout editing as a self-supervised reconstruction problem informed by explicit element relations and standardized editing operations. The approach unifies multiple editing actions (add, delete, move, resize) within a multi-modal LLM (MLLM) backbone, enabling versatile, data-efficient, and accurate design editing while robustly maintaining the geometric structure of unedited regions (Lin et al., 1 Feb 2026).

1. Formalization of Design Elements and Relation Graph

In RADR, a design DD is a set of NN elements D={Ei}i=1ND = \{E_i\}_{i=1}^N, where each EiE_i is characterized by its content CiC_i (image or text) and geometric attributes AiA_i. For images, Ai=(xi,yi,wi,hi)A_i = (x_i, y_i, w_i, h_i) specifies position and size; for text Ai=(xi,yi,wi,hi,ϕi,αi,ri)A_i = (x_i, y_i, w_i, h_i, \phi_i, \alpha_i, r_i), with ϕi\phi_i as font size, αi\alpha_i as angle, and rir_i as alignment.

RADR introduces a relation graph G=(V,R)G = (V, R) encoding layout structure. Nodes V={E0}{Ei}i=1NV = \{E_0\} \cup \{E_i\}_{i=1}^N comprise design elements and the canvas node E0E_0. Directed edges RR describe pairwise relations, partitioned into size RsizeR^{\text{size}} and position RposR^{\text{pos}}:

  • Size relations: Area ratio ARij=wihiwjhj\mathrm{AR}_{ij} = \frac{w_i h_i}{w_j h_j} between elements Ei,EjE_i, E_j is classified as “small”, “equal”, or “large” with tolerance α\alpha:

Rijsize={smallARij<1α equal1αARij1+α largeARij>1+αR_{ij}^{\text{size}} = \begin{cases} \text{small} & \mathrm{AR}_{ij} < 1-\alpha \ \text{equal} & 1-\alpha \leq \mathrm{AR}_{ij} \leq 1+\alpha \ \text{large} & \mathrm{AR}_{ij} > 1+\alpha \end{cases}

No size edges involve the canvas.

  • Position relations: Each reference bounding box defines a 3×33\times 3 grid; relations are {TL, T, TR, L, C, R, BL, B, BR} according to source center location relative to the target.

In practice, rather than dense adjacency tensors, RADR serializes relations into edge sequences for input.

2. Self-Supervised Reconstruction Objective

The training process emulates layout editing as a conditional attribute reconstruction problem. The model fθ:(C,G,O)A^f_\theta: (C, G, O) \mapsto \hat{A} predicts edited attributes A^\hat{A}, given design contents CC, the (possibly pruned) relation graph GG, and a synthesized editing operation OO. The objective is the negative log-likelihood (NLL):

Lrec=ED,O[logPθ(AC,G,O)]=t=1TlogPθ(ata<t,C,G,O)\mathcal{L}_{\text{rec}} = -\mathbb{E}_{D,O}\left[ \log P_\theta (A \mid C, G, O) \right] = -\sum_{t=1}^{T} \log P_\theta (a_t \mid a_{<t}, C, G, O)

where ata_t are attribute tokens in an autoregressive factorization.

Weight-decay regularization λθ22\lambda \|\theta\|_2^2 is applied to LoRA-adapted LLM parameters. To synthesize supervision, a random operation OO is sampled and edges related to the target element are removed from GG, compelling the model to infer the new attributes from remaining structure and OO. This process bypasses the need for explicit (original, operation, edited) triplets.

3. Standardized Editing Operations and Data Synthesis

Every operation is represented in tuple format: O=(action,itarget,params)O = (\mathrm{action}, i_{\rm target}, \mathrm{params}), with action{add,delete,move,resize}\mathrm{action} \in \{\text{add}, \text{delete}, \text{move}, \text{resize}\}. Actions are defined as:

Action Target Parameters
add index in validation none
delete index in training none
move index in training new (x,y)(x, y)
resize index in training new (w,h)(w, h)

For self-supervised samples, a design DD is selected, GG is built, operation OO is sampled, affected edges are removed, and the model learns from (C,G,O)A(C, G', O) \mapsto A.

4. Multi-Modal Model Architecture

RADR leverages an MLLM composed of:

  • Vision encoder: ϕvis\phi_\mathrm{vis} (e.g., CLIP ViT-L/14, frozen), producing visual tokens per image element.
  • Projector: A 2-layer MLP with GELU maps vision embeddings to LLM token space.
  • LLM backbone: (e.g., Llama-3.1-8B), which receives all tokenized inputs and autoregressively emits attribute predictions as JSON.

Inputs are concatenated from: projected image tokens, text tokens, serialized relation tokens, and operation tokens (e.g., “MOVE element 3 TO (120, 450)”). This deck of tokens is processed with positional embeddings. The output is parsed to extract new attributes for rendering.

5. Training and Inference Process

5.1 Self-Supervised Fine-Tuning

  • Dataset: Crello v4, approximately 23k designs, filtered to exclude designs with over 25 elements.
  • Sampling: For each design DD, build GG, synthesize OO, remove edges for the operation target, yielding the input for one training sample.
  • Optimization: Backbone Llama-3.1-8B (LoRA-adapted, AdamW optimizer, lr = $2$e-$4$, weight decay $0.01$); frozen vision encoder; LoRA rank 32; batch size 64; trained on 8 GPUs for approximately 50k steps.

5.2 Inference Flow

Given a new input (Din,O)(D_{\rm in}, O):

  1. Extract current element content CC and attributes AinA_{\rm in}.
  2. Build relation graph GG (removing affected edges).
  3. Tokenize (C,G,O)(C, G, O).
  4. Forward pass through the fine-tuned MLLM.
  5. Parse JSON to obtain AoutA_{\rm out}.
  6. Render final design from predicted attributes.

6. Experimental Results and Comparative Analysis

RADR was benchmarked against GPT-4o (multi-modal assistant), FlexDM (masked-field layout generation), LaDeCo (layered design composer), and PosterLLaVA (relation-conditioned layout). Key evaluation metrics include design/layout, content, typography/color, graphics/images, innovation, overlap (Ove, lower better), alignment (Ali, lower better), size relation preservation, position relation preservation, and operation accuracy.

Model Design/Layout–Innovation Ove Ali Size-Rel Pres. Pos-Rel Pres. Op-Acc
RADR (Ours) 8.25–7.10 0.0996 0.0013 0.9150 0.8684 0.9991
GPT-4o 7.41–6.37 0.1942 0.0011 0.8386 0.5544 0.9983
FlexDM 5.34–4.54 0.3242 0.0016
LaDeCo 8.08–6.98 0.0865 0.0013
PosterLLaVA 0.8822 0.8458

In the generalization setting, RADR further improves size and position relation preservation (0.9475 and 0.9157, respectively), outperforming all baselines, including GPT-4o (0.8447 and 0.5444). Human preference surveys (200 samples) indicate significantly higher preference for RADR output over GPT-4o, particularly regarding visual quality and structural preservation (up to 86.0% preferred structure retention in both reconstruction and generalization).

7. Ablation Studies and Structure Preservation

Ablation studies demonstrate critical contributions of RADR and the explicit relation graph. Removing RADR or the relation graph from the model substantially degrades size and position relation preservation (to 0.80\sim0.80 and 0.37\sim0.37), in contrast to the full model (0.9150 and 0.8684). Using a dense adjacency matrix is less effective than a serialized edge sequence.

Setting Design/Layout–Innovation Ove Ali Size Rel Pos Rel Op
w/o RADR 8.12–7.01 0.1007 0.0011 0.8008 0.3750 0.9978
w/o relation graph 8.21–7.07 0.0980 0.0013 0.7943 0.3751 0.9987
w/ matrix 8.17–7.04 0.0954 0.0012 0.8892 0.8529 0.9909
Ours (serialized) 8.25–7.10 0.0996 0.0013 0.9150 0.8684 0.9991

These results confirm the necessity of both relation awareness and the reconstruction formulation for reliable structure preservation in automatic design layout editing (Lin et al., 1 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Relation-Aware Design Reconstruction (RADR).