Multi-Level SSCG in GRACE
- The paper presents a unified SSCG architecture that merges five distinct semantic graphs to encode code dependencies and boost completion accuracy.
- The methodology involves sequential graph extraction, cross-level edge integration, and graph-aware re-ranking using GNNs and attention mechanisms.
- Results indicate significant performance gains with +8.19% EM and +7.51% ES improvements over baseline models in repository-aware code completion.
Multi-level Semantic-Structural Code Graphs (SSCGs) in GRACE refer to complex, hierarchically-structured graph representations designed to encode the multifaceted semantic and structural dependencies present in large software repositories. In the repository-aware code completion system GRACE (Wang et al., 7 Sep 2025), the SSCG underpins retrieval, context fusion, and prompt generation mechanisms, enabling the model to provide code completions that are sensitive to both the local context and the broader repository structure. The SSCG's multi-level architecture encompasses diverse semantic categories—including file structure, syntax, call graphs, class hierarchies, and data flow—each represented and integrated systematically, then leveraged for retrieval and fusion with the query context.
1. Definition and Organization of Multi-Level SSCG
The SSCG in GRACE is a heterogeneous, multi-granular graph built by merging five distinct semantic graphs, each corresponding to a critical aspect of codebase structure:
- File/Directory-Structure Graph (): Nodes represent folders and files, with edges denoting “contains,” “imports,” and “references” relationships. Node features aggregate file-system path embeddings and type encodings.
- Abstract Syntax Tree Graph (): Nodes represent AST elements (statements, expressions, identifiers); edges denote parent-child syntactic relations.
- Function-Call Graph (): Nodes are fully qualified functions; edges are caller-callee relations. Node features include embeddings of function signatures and documentation.
- Class-Hierarchy Graph (): Nodes represent classes/interfaces; edges include inheritance, implementation, and composition.
- Data-Flow Graph (): Nodes correspond to variable definitions and uses; edges link definitions to uses (bidirectionally).
Each subgraph utilizes adjacency matrices and dense node-feature matrices . The unified SSCG merges all node/edge sets and adds cross-level edges (“file → AST-root of function,” “function → AST-root,” “type-usage alignments,” etc.), while tracking node and edge types for heterogeneous processing.
2. SSCG Construction and Merging Strategies
The process for SSCG construction involves three stages:
- Per-Level Graph Extraction: Each semantic/structural facet is parsed independently from the codebase, yielding five disjoint (but internally rich) graphs.
- Introduction of Cross-Level Edges: Inter-level dependencies are encoded by connecting nodes across graphs—e.g., connecting file nodes to each function's AST, variable type nodes to class/interface definitions, and linking AST with control/data-flow components within functions.
- Block-Matrix Assembly: The union of all node/edge sets creates a block-structured adjacency matrix with cross-block off-diagonal submatrices representing cross-level connections. All feature matrices are concatenated vertically to form the node-feature matrix for the unified graph.
This results in an SSCG that preserves both granularity and heterogeneous linkage—allowing for the simultaneous encoding of high-level (file, class) and low-level (syntax, data flow) dependencies.
3. Hybrid Graph Retrieval and Re-ranking
Given a query code snippet , GRACE employs a dual-path retrieval strategy:
- Semantic Retrieval: Embeds (via CodeT5p) and queries a vector index for top- code snippets by semantic cosine similarity.
- Structural Retrieval: Encodes each node in using a concatenation of code and spectral (Laplacian positional) embeddings; aggregates node embeddings for subgraphs via a sum-readout through a GNN and retrieves top- structurally similar subgraphs.
The union of semantic and structural candidates undergoes a graph-aware re-ranking procedure. A learned parameter blends the semantic and structural similarities for each candidate. Optionally, Maximal Marginal Relevance (MMR) is applied to encourage diversity among final candidates. Subsequently, a two-layer Graph Attention Network (GAT) is used to build a bipartite graph between query and retrieved nodes, refining candidate rankings according to attention scores.
4. Fusion of Retrieved Subgraphs and Query Graph
GRACE performs deep integration of repository context by fusing retrieved subgraphs with the query graph both at the node-feature and graph-structure levels:
- Node-Feature Fusion: All graphs are encoded via a shared GNN; retrieved subgraph embeddings are aggregated with weights proportional to their scores. Cross-attention between query and retrieved node sets computes an attention matrix .
- Graph-Structure Fusion: For every pair with high cross-attention and type match, a cross-edge is added, integrating retrieval context into the query graph. The fused graph incorporates all original and added cross-level edges.
Unified adjacency and feature matrices for are constructed, preserving the original multi-level and cross-level relationships while augmenting the local query context with repository-wide context.
5. Serialization and LLM Integration
The fused SSCG is serialized into a compact textual representation suitable for LLM contextualization. The graph is linearized as a sequence of typed node and edge triples:
- Node Representation: “node〈id〉:〈type〉:〈property〉”
- Edge Representation: “edge〈u〉→〈v〉:〈edgeType〉”
The serialized SSCG is concatenated to the user’s “code-before-cursor” and formulated into the final prompt for the LLM backbone. The LLM receives a prompt of the form:
and then predicts the code completion, maximizing .
6. Impact and Performance
The multi-level SSCG architecture enables GRACE to outperform baseline repository-level code completion systems on public benchmarks, achieving a reported +8.19% EM and +7.51% ES improvement over the strongest graph-based RAG baselines using DeepSeek-V3 as the backbone LLM (Wang et al., 7 Sep 2025). This methodology preserves and exploits multi-granular, cross-cutting code dependencies that are otherwise lost in retrieval or context-window-constrained completion, making SSCG essential for repository-aware code generation accuracy. The pipeline allows the model to reason over complex program invariants, inheritance, and usage relationships, markedly improving completion faithfulness and context relevance.
Table: Semantic Levels and Graph Construction in GRACE SSCG
| Level () | Nodes | Edge Types |
|---|---|---|
| 1 (File/Dir) | Folders, Files | Contains, Imports, References |
| 2 (AST) | Syntax Elements (Statem., Expr., Ident.) | Parent → Child |
| 3 (Call Graph) | Functions (FQNs) | Caller → Callee |
| 4 (Class Hier.) | Classes, Interfaces | Inheritance, Implementation, Composition |
| 5 (Data-Flow) | Variable definitions/uses | Def → Use, Use → Def |
The five-level architecture, cross-level connectivity, and attention-based subgraph integration are the core innovations underlying SSCG in GRACE, collectively providing repository-aware, structurally faithful code completion (Wang et al., 7 Sep 2025).