Papers
Topics
Authors
Recent
Search
2000 character limit reached

Edge-Based γ-Quasi-Clique Model

Updated 28 January 2026
  • Edge-based γ-quasi-clique is a dense subgraph model defined by a density threshold γ that generalizes cliques by requiring a minimum fraction of potential edges.
  • The model supports rigorous extremal, probabilistic, and algorithmic analyses, with NP-completeness results and innovative MILP formulations enhancing its study.
  • Recent advances leverage energy diffusion, convex relaxations, and multiobjective strategies to efficiently detect, recover, and optimize dense structures in large-scale graphs.

An edge-based γγ-quasi-clique is a fundamental dense subgraph model in graph theory and network science. Given a simple undirected graph G=(V,E)G = (V, E) and density threshold γ(0,1]\gamma \in (0,1], a subset SVS \subseteq V is a γγ-quasi-clique if its induced subgraph satisfies E(S)γ(S2)|E(S)| \geq \gamma \cdot \binom{|S|}{2}. The model generalizes the classical clique (γ=1\gamma=1) and supports rigorous extremal, probabilistic, and algorithmic analysis. The largest cardinality of such a subset is called the γγ-quasi-clique number of GG and denoted ω(γ)(G)\omega^{(\gamma)}(G). This model has driven progress in extremal combinatorics, random graph theory, planted subgraph detection, and large-scale graph mining (Balister et al., 2018, Bogerd, 2020, Xia et al., 21 Jan 2026, Zhang et al., 6 Aug 2025).

1. Mathematical Definition and Extremal Parameters

Given G=(V,E)G=(V,E) and SVS \subseteq V, let E(S)E(S) be the set of edges induced by SS. SS is an edge-based γγ-quasi-clique if

E(S)γ(S2)|E(S)| \geq \gamma \cdot \binom{|S|}{2}

and the γγ-quasi-clique number is

ω(γ)(G)=max{S:SV,E(S)γ(S2)}.\omega^{(\gamma)}(G) = \max \left\{\,|S|: S \subseteq V,\, |E(S)| \geq \gamma \binom{|S|}{2}\, \right\}.

When γ=1\gamma=1, this coincides with the clique number.

In Erdős–Rényi random graphs Gn,pG_{n,p}, the parameter ω(γ)(Gn,p)\omega^{(\gamma)}(G_{n,p}) exhibits sharp concentration. Let

α(γ,p)=γlog(γp)+(1γ)log(1γ1p),\alpha(\gamma,p) = \gamma \log \left(\frac{\gamma}{p}\right) + (1-\gamma) \log \left(\frac{1-\gamma}{1-p}\right),

the Kullback–Leibler divergence D(Ber(γ)Ber(p))D(\mathrm{Ber}(\gamma)\,\|\,\mathrm{Ber}(p)). The two-point concentration theorem states

ω(γ)(Gn,p){2α(γ,p)(lognloglogn+logeα(γ,p)2)+12,}\omega^{(\gamma)}(G_{n,p}) \in \left\{\,\left\lfloor \frac{2}{\alpha(\gamma,p)} \left(\log n - \log \log n + \log \frac{e\alpha(\gamma,p)}{2} \right) + \frac12\right\rfloor, \left\lceil \ldots \right\rceil \,\right\}

with high probability as nn \rightarrow \infty (Balister et al., 2018, Bogerd, 2020). In inhomogeneous random graphs with kernel κ:[0,1]2(0,1)\kappa: [0,1]^2 \to (0,1) and edge probability pmax=maxx,yκ(x,y)p_{\max} = \max_{x,y} \kappa(x,y), the largest γγ-quasi-clique again satisfies

ω(γ)(G)=2lognD(γ,pmax)(1+o(1)),\omega^{(\gamma)}(G) = \frac{2 \log n}{D(\gamma, p_{\max})} (1+o(1)),

demonstrating that the leading-order behavior depends only on pmaxp_{\max} (Bogerd, 2020).

2. Algorithmic Complexity and Exact Algorithms

Determining whether GG contains a γγ-quasi-clique of size at least kk is NP-complete for any fixed γ(0,1)\gamma \in (0,1) and kk (Xia et al., 21 Jan 2026). The lack of hereditary property (i.e., induced subgraphs of a γγ-quasi-clique may not be γγ-quasi-cliques themselves) limits classical pruning strategies such as those used for cliques or hereditary properties.

Recent major advances include iteratively reducing the maximum γγ-quasi-clique problem to kk-defective clique computations, where a kk-defective clique allows for at most kk missing edges and is hereditary. The EQC-Pro algorithm (Xia et al., 21 Jan 2026) uses a bottom-up doubling and binary search approach, achieves O(βκn)O^*(\beta_\kappa^n) time with βκ<2\beta_\kappa<2 (improving on previous O(2n)O^*(2^n) approaches), leverages dynamic degeneracy-based heuristics, and outperforms QClique/FPCE by up to four orders of magnitude on large real-world graphs.

3. Mathematical Programming and Multiobjective Formulations

The edge-based γγ-quasi-clique problem is naturally formulated as a Mixed Integer Linear Program (MILP). For MQC, the following formulation holds (Santos et al., 2024, Santos et al., 2024): maxiVxi s.t.{i,j}EyijγS(S1)2 yijxi,    yijxj,    {i,j}E xi{0,1},    yij0\begin{align*} \max &\quad \sum_{i\in V} x_i \ \text{s.t.} &\quad \sum_{\{i,j\}\in E} y_{ij} \geq \gamma \cdot \frac{|S|(|S|-1)}{2} \ &\quad y_{ij} \leq x_i,\;\; y_{ij} \leq x_j,\;\; \forall\{i,j\}\in E \ &\quad x_i \in \{0,1\},\;\; y_{ij} \geq 0 \end{align*} with xi=1x_i=1 iff vertex ii belongs to SS, yij=1y_{ij}=1 iff {i,j}E(S)\{i,j\} \in E(S).

Multiobjective formulations simultaneously maximize both density and cardinality (the Multiobjective Quasi-Clique Problem, MOQC). Scalarization approaches such as ε-constraint and weighted-sum are efficient due to the total unimodularity of the LP relaxations, and a three-phase strategy combining dichotomic search, local search exploiting quasi-heredity, and ε-constraint fill-in provides strong empirical performance on real-world networks (Santos et al., 2024).

4. Heuristic Algorithms and Scalable Approaches

On large-scale graphs, exact methods remain challenging; thus, heuristics such as diffusion-based clustering and degeneracy ordering are used. The EDQC algorithm (Zhang et al., 6 Aug 2025) introduces an energy-diffusion approach: energy is propagated stochastically from seed nodes, with high-energy vertices indicating structural cohesion. EDQC sidesteps explicit candidate enumeration, providing competitive speed and higher solution quality than previous metaheuristics while maintaining low solution variance.

Dynamic heuristics in the EQC-Pro framework further raise lower bounds within the search and can iteratively expand solutions via degeneracy and neighborhood search. These procedures exploit quasi-hereditary properties and local extension rules to efficiently traverse the search landscape (Xia et al., 21 Jan 2026).

5. Planted Quasi-Clique Recovery and Convex Relaxations

For the detection and recovery of planted γγ-quasi-cliques in noisy environments, convex optimization approaches grounded in robust PCA and matrix decomposition have been proposed. The rank-sparsity (low-rank plus sparse) matrix decomposition attempts to extract the quasi-clique adjacency via nuclear norm and 1\ell_1 penalties with explicit guarantees (Abdulsalaam et al., 2022): min0B1B+λAB1s.t.  i,jBijγnc2\min_{0 \leq B \leq 1} \|B\|_* + \lambda \|A - B\|_1 \quad \text{s.t.}\; \sum_{i,j} B_{ij} \geq \gamma n_c^2 When the planted subgraph satisfies certain incoherence and sampling conditions (p,γrlogn/np, \gamma \gtrsim r \log n / n), exact recovery is possible with high probability, as certified by a dual variable constructed using a golfing scheme (Abdulsalaam et al., 2022).

6. Structural, Probabilistic, and Query-Theoretic Insights

The edge-based γγ-quasi-clique model possesses rich structural and probabilistic properties:

  • In random graphs, ω(γ)(Gn,p)\omega^{(\gamma)}(G_{n,p}) exhibits two-point concentration near its mean, governed by large deviation rates α(γ,p)\alpha(\gamma,p) or D(γ,p)D(\gamma,p) (Balister et al., 2018, Bogerd, 2020).
  • As pγp \to \gamma, the rate approaches zero and quasi-cliques grow larger; for small p,γp, \gamma expansions exist to characterize the thresholds (Balister et al., 2018).
  • In query-limited models, the largest discoverable γγ-quasi-clique (or dense subgraph) in Gn,1/2G_{n,1/2}, under nδn^\delta adjacency queries and \ell adaptive rounds, is sharply bounded by combinatorial anti-matching arguments:

α(δ,,η)1+1(2δ)24γ()\alpha_\star(\delta, \ell, \eta) \leq 1 + \sqrt{1 - \frac{(2 - \delta)^2}{4\gamma(\ell)}}

where γ()\gamma(\ell) is a universal constant determined by edge-label/matching combinatorics (Csóka et al., 2023). These bounds are tight in regimes of high query-complexity or adaptivity.

7. Extensions: Connectivity, Biobjective Models, and Further Directions

Classical MQC and DKS formulations can yield disconnected subgraphs, which are often undesirable. Flow-based constraints (C-STree, C-Flow) have been incorporated into MILPs to enforce connectedness of the returned subgraphs. This ensures practical relevance in domains where connectivity is essential and delivers near-optimal solution rates and reduced runtimes on sparse real networks (Santos et al., 2024).

The multiobjective view (size vs. density) allows the computed Pareto front to capture efficiency tradeoffs between small, dense and large, sparser quasi-cliques. Algorithms recover supported Pareto points via dichotomic LP search, filling in non-supported points using ε-constraint approaches, and exploit structural properties such as quasi-heredity and degree-extension (Santos et al., 2024).

Further open directions include tightening query-complexity lower bounds for subpolynomial adaptivity, improving worst-case fixed-parameter complexity in exact algorithms, and developing robust methodologies for noisy, multi-layer, or attributed network settings.


References

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Edge-Based $γ$-Quasi-Clique Model.