Sample compression schemes for balls in structurally sparse graphs

Published 3 Apr 2026 in cs.DM and math.CO | (2604.02949v1)

Abstract: Sample compression schemes were defined by Littlestone and Warmuth (1986) as an abstraction of the structure underlying many learning algorithms. In a sample compression scheme, we are given a large sample of vertices of a fixed hypergraph with labels indicating the containment in some hyperedge. The task is to compress the sample in such a way that we can retrieve the labels of the original sample. The size of a sample compression scheme is the amount of information that is kept in the compression. Every hypergraph with a sample compression scheme of bounded size must have bounded VC-dimension. Conversely, Moran and Yehudayoff (J. ACM, 2016) showed that every hypergraph of bounded VC-dimension admits a sample compression scheme of bounded size. We study a specific class of hypergraphs emerging from balls in graphs. The schemes that we construct (contrary to the ones constructed by Moran and Yehudayoff) are \textit{proper}, meaning that we retrieve not only the labeling of the original sample but also a hyperedge (ball) consistent with the original labeling. First, we prove that for every graph $G$ of treewidth at most $t$, the hypergraph of balls in $G$ has a proper sample compression scheme of size $\mathcal{O}(t\log t)$; this is tight up to the logarithmic factor and improves the quadratic (improper) bound that follows from the result of Moran and Yehudayoff. Second, we prove an analogous result for graphs of cliquewidth at most $t$.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper presents nearly tight proper sample compression schemes for balls in structurally sparse graphs, achieving O(t log t) bounds for treewidth and cliquewidth.
It introduces novel techniques using tree and NLC decompositions to efficiently compress the sample while preserving ball structure.
The findings enhance our understanding of learnability in geometric graph classes and set the stage for further exploration of linear compression bounds.

Summary of "Sample Compression Schemes for Balls in Structurally Sparse Graphs" (2604.02949)

Introduction and Motivation

The framework of sample compression schemes, introduced by Littlestone and Warmuth, serves as an abstraction for learning algorithms where a finite set of examples (a labeled sample) can be represented by a compact subset (the compression), from which the labels of the original sample can be reconstructed. The minimal size of such a scheme is intrinsically related to the VC-dimension of the underlying concept class: compression schemes of bounded size exist only for classes with bounded VC-dimension, and conversely, all classes with finite VC-dimension possess bounded-size compression schemes (Moran and Yehudayoff).

This work considers hypergraphs arising from geometric objects—specifically, balls of arbitrary radius in finite graphs—and focuses on classes of structurally sparse graphs, characterized by bounded treewidth, cliquewidth, and related parameters. Prior to this work, the best-known general compression bounds for such classes were quadratic (improper), or exponential in VC-dimension for the general case. The schemes constructed in this paper are proper, i.e., the reconstruction yields a set from the same class as the original concept (a ball in the graph).

Main Results

The main contribution is the demonstration of nearly tight, efficient sample compression schemes for hypergraphs of balls in graphs with structural sparsity.

Proper Compression for Graphs of Bounded Treewidth

For any graph $G$ of treewidth at most $t$ , the paper shows that the hypergraph of balls in $G$ admits a proper sample compression scheme of size $O(t \log t)$ . This matches the known lower bound up to a logarithmic factor, and significantly improves upon the previously known $O(t^2 \log t)$ bound derived from generic VC-dimension arguments and their duals [Moran & Yehudayoff, JACM 2016].

Proper Compression for Graphs of Bounded Cliquewidth

Analogous sample compression results are established for graphs of cliquewidth at most $t$ : the hypergraph of balls in such graphs admits a proper scheme of size $O(t \log t)$ . The construction extends to graphs with bounded NLC-width and is essentially optimal up to logarithmic factors, given known lower bounds on VC-dimension in these classes.

Special Cases and Enhanced Bounds

For graphs with a vertex cover of size $t$ , a proper compression scheme of size $t+4$ is constructed, fully removing the logarithmic factor.
For planar graphs and more generally graphs of bounded local treewidth, proper sample compression schemes for balls of bounded radius also have explicit, nearly linear (in $r$ ) size bounds.
For the hypergraph of closed neighborhoods in graphs of degeneracy $t$ 0, a sample compression scheme of size $t$ 1 is given, with tightness shown via combinatorial examples.

Contrasts and Lower Bounds

The paper proves that the logarithmic factor cannot be avoided under current techniques for treewidth/cliquewidth, using bipartite graphs that realize large VC-dimension.
For graphs of bounded twin-width, the approach fails: hypergraphs of balls can have arbitrarily high VC-dimension even when twin-width is bounded, as demonstrated via subdivision arguments.

Techniques

The authors develop a compression framework tailored to the graph structure:

Tree Decomposition Separation: By leveraging properties of tree decompositions, the sample can be efficiently represented using labels of vertices at small separators, with distances encoded via "witnesses" rather than explicitly storing all metric information.
NLC-decompositions and Types: For cliquewidth/NLC-width, the compression tracks classes of neighborhoods ("types") and makes use of the limited diversity in connections across separators to bound the required information.
Properness: The constructed reconstructor always outputs a ball (not just a consistent labeling), achieving a proper scheme, which is frequently harder than improper compression.
Array Compression Format: The analysis uses and justifies array compression, which is essentially equivalent to standard compression up to logarithmic factors.

Implications and Discussion

The significance of these results is both theoretical and algorithmic. Proper sample compression with size nearly linear in treewidth or cliquewidth gives insights into the learnability and PAC sample complexity of such geometric concept classes. The findings solve open problems concerning proper compression in chordal and planar graphs, and resolve (up to log factors) sample compression bounds for wide classes of structurally sparse graphs.

Table: Compression Scheme Size Bounds

Graph Structural Parameter	Proper Compression Scheme Size
Treewidth $t$ 2	$t$ 3
Cliquewidth $t$ 4	$t$ 5
Vertex cover $t$ 6	$t$ 7
Chordal (clique number $t$ 8)	$t$ 9
Planar, balls radius $G$ 0	$G$ 1
Degeneracy $G$ 2 (closed nbhd)	$G$ 3

The approach also clarifies which classes of sparse graphs admit efficient compression and which do not, drawing new boundaries (e.g., bounded twin-width is insufficient). The paper establishes stronger bounds than those previously available for the general case and connects these to fundamental conjectures about structural and metric VC-dimension in graphs.

Open Problems and Future Directions

The work concludes with several open problems:

Removing the logarithmic factor: Is a linear-in- $G$ 4 proper sample compression scheme possible for treewidth or cliquewidth?
Minor-closed graph classes: Is there a proper compression scheme of size $G$ 5 for $G$ 6-minor-free graphs?
Extension to more general parameters: Can similar compression bounds be realized under weaker sparsity notions, e.g., bounded treedepth or maximum degree?
The status of proper sample compression for closed neighborhoods in graphs of bounded degeneracy and balls in minor-closed classes remains unresolved.

Solving these would yield further progress on the sample compression conjecture and the precise relationship between structural sparsity measures in graphs and learnability properties.

Conclusion

This work provides a comprehensive, nearly tight analysis of proper sample compression schemes for balls in structurally sparse graph classes, unifying and sharpening a variety of previous results. The developments have substantial impact on both learning theory and structural graph theory, clarifying the geometric and combinatorial underpinnings of compression in classes characterized by bounded decomposability parameters.

Reference: "Sample compression schemes for balls in structurally sparse graphs" (2604.02949)

Markdown Report Issue