Edge-Based γ-Quasi-Clique Model
- Edge-based γ-quasi-clique is a dense subgraph model defined by a density threshold γ that generalizes cliques by requiring a minimum fraction of potential edges.
- The model supports rigorous extremal, probabilistic, and algorithmic analyses, with NP-completeness results and innovative MILP formulations enhancing its study.
- Recent advances leverage energy diffusion, convex relaxations, and multiobjective strategies to efficiently detect, recover, and optimize dense structures in large-scale graphs.
An edge-based -quasi-clique is a fundamental dense subgraph model in graph theory and network science. Given a simple undirected graph and density threshold , a subset is a -quasi-clique if its induced subgraph satisfies . The model generalizes the classical clique () and supports rigorous extremal, probabilistic, and algorithmic analysis. The largest cardinality of such a subset is called the -quasi-clique number of and denoted . This model has driven progress in extremal combinatorics, random graph theory, planted subgraph detection, and large-scale graph mining (Balister et al., 2018, Bogerd, 2020, Xia et al., 21 Jan 2026, Zhang et al., 6 Aug 2025).
1. Mathematical Definition and Extremal Parameters
Given and , let be the set of edges induced by . is an edge-based -quasi-clique if
and the -quasi-clique number is
When , this coincides with the clique number.
In Erdős–Rényi random graphs , the parameter exhibits sharp concentration. Let
the Kullback–Leibler divergence . The two-point concentration theorem states
with high probability as (Balister et al., 2018, Bogerd, 2020). In inhomogeneous random graphs with kernel and edge probability , the largest -quasi-clique again satisfies
demonstrating that the leading-order behavior depends only on (Bogerd, 2020).
2. Algorithmic Complexity and Exact Algorithms
Determining whether contains a -quasi-clique of size at least is NP-complete for any fixed and (Xia et al., 21 Jan 2026). The lack of hereditary property (i.e., induced subgraphs of a -quasi-clique may not be -quasi-cliques themselves) limits classical pruning strategies such as those used for cliques or hereditary properties.
Recent major advances include iteratively reducing the maximum -quasi-clique problem to -defective clique computations, where a -defective clique allows for at most missing edges and is hereditary. The EQC-Pro algorithm (Xia et al., 21 Jan 2026) uses a bottom-up doubling and binary search approach, achieves time with (improving on previous approaches), leverages dynamic degeneracy-based heuristics, and outperforms QClique/FPCE by up to four orders of magnitude on large real-world graphs.
3. Mathematical Programming and Multiobjective Formulations
The edge-based -quasi-clique problem is naturally formulated as a Mixed Integer Linear Program (MILP). For MQC, the following formulation holds (Santos et al., 2024, Santos et al., 2024): with iff vertex belongs to , iff .
Multiobjective formulations simultaneously maximize both density and cardinality (the Multiobjective Quasi-Clique Problem, MOQC). Scalarization approaches such as ε-constraint and weighted-sum are efficient due to the total unimodularity of the LP relaxations, and a three-phase strategy combining dichotomic search, local search exploiting quasi-heredity, and ε-constraint fill-in provides strong empirical performance on real-world networks (Santos et al., 2024).
4. Heuristic Algorithms and Scalable Approaches
On large-scale graphs, exact methods remain challenging; thus, heuristics such as diffusion-based clustering and degeneracy ordering are used. The EDQC algorithm (Zhang et al., 6 Aug 2025) introduces an energy-diffusion approach: energy is propagated stochastically from seed nodes, with high-energy vertices indicating structural cohesion. EDQC sidesteps explicit candidate enumeration, providing competitive speed and higher solution quality than previous metaheuristics while maintaining low solution variance.
Dynamic heuristics in the EQC-Pro framework further raise lower bounds within the search and can iteratively expand solutions via degeneracy and neighborhood search. These procedures exploit quasi-hereditary properties and local extension rules to efficiently traverse the search landscape (Xia et al., 21 Jan 2026).
5. Planted Quasi-Clique Recovery and Convex Relaxations
For the detection and recovery of planted -quasi-cliques in noisy environments, convex optimization approaches grounded in robust PCA and matrix decomposition have been proposed. The rank-sparsity (low-rank plus sparse) matrix decomposition attempts to extract the quasi-clique adjacency via nuclear norm and penalties with explicit guarantees (Abdulsalaam et al., 2022): When the planted subgraph satisfies certain incoherence and sampling conditions (), exact recovery is possible with high probability, as certified by a dual variable constructed using a golfing scheme (Abdulsalaam et al., 2022).
6. Structural, Probabilistic, and Query-Theoretic Insights
The edge-based -quasi-clique model possesses rich structural and probabilistic properties:
- In random graphs, exhibits two-point concentration near its mean, governed by large deviation rates or (Balister et al., 2018, Bogerd, 2020).
- As , the rate approaches zero and quasi-cliques grow larger; for small expansions exist to characterize the thresholds (Balister et al., 2018).
- In query-limited models, the largest discoverable -quasi-clique (or dense subgraph) in , under adjacency queries and adaptive rounds, is sharply bounded by combinatorial anti-matching arguments:
where is a universal constant determined by edge-label/matching combinatorics (Csóka et al., 2023). These bounds are tight in regimes of high query-complexity or adaptivity.
7. Extensions: Connectivity, Biobjective Models, and Further Directions
Classical MQC and DKS formulations can yield disconnected subgraphs, which are often undesirable. Flow-based constraints (C-STree, C-Flow) have been incorporated into MILPs to enforce connectedness of the returned subgraphs. This ensures practical relevance in domains where connectivity is essential and delivers near-optimal solution rates and reduced runtimes on sparse real networks (Santos et al., 2024).
The multiobjective view (size vs. density) allows the computed Pareto front to capture efficiency tradeoffs between small, dense and large, sparser quasi-cliques. Algorithms recover supported Pareto points via dichotomic LP search, filling in non-supported points using ε-constraint approaches, and exploit structural properties such as quasi-heredity and degree-extension (Santos et al., 2024).
Further open directions include tightening query-complexity lower bounds for subpolynomial adaptivity, improving worst-case fixed-parameter complexity in exact algorithms, and developing robust methodologies for noisy, multi-layer, or attributed network settings.
References
- "Dense Subgraphs in Random Graphs," (Balister et al., 2018)
- "Quasi-cliques in inhomogeneous random graphs," (Bogerd, 2020)
- "Maximum Edge-based Quasi-Clique: Novel Iterative Frameworks," (Xia et al., 21 Jan 2026)
- "Quasi-Clique Discovery via Energy Diffusion," (Zhang et al., 6 Aug 2025)
- "Solving the Multiobjective Quasi-Clique Problem," (Santos et al., 2024)
- "Ensuring connectedness for the Maximum Quasi-clique and Densest -subgraph problems," (Santos et al., 2024)
- "Rank-sparsity decomposition for planted quasi clique recovery," (Abdulsalaam et al., 2022)
- "Finding cliques and dense subgraphs using edge queries," (Csóka et al., 2023)