Repetitiveness Measure χ in String Combinatorics

Updated 30 December 2025

Repetitiveness Measure χ is a combinatorial invariant defined via minimal suffixient sets that capture all irreducible right-extensions of a string.
It bridges theoretical string metrics and practical applications, linking its value to the number of runs in the Burrows–Wheeler Transform and compressed indexing.
Efficient linear-time algorithms compute χ, although its sensitivity varies with operations like appending, rotations, and reversals.

The repetitiveness measure $χ$ is a combinatorial invariant quantifying the essential repetitive structure of a string or infinite word. Defined through the minimal size of a suffixient set—which characterizes the placement of all irreducible right-extensions— $χ$ acts as a central metric in the hierarchy of repetitiveness measures, especially in the context of compressed string indexes and combinatorial word theory. Its relationship to the number of runs $r$ in the Burrows–Wheeler Transform (BWT) and to other classical measures underpins both structural analysis and algorithmic applications in stringology, symbolic dynamics, and Diophantine approximation.

1. Formal Definitions: Suffixient Sets and Repetitiveness Measure $χ$

Let $\Sigma$ be a finite ordered alphabet of size $\sigma$ , and let $w \in \Sigma^*$ be a finite string, terminated by an endmarker $\$$such that$\$%%%%9%%%%a \in \Sigma%%%%10%%%%χ(w) $is defined via the following constructs (<a href="/papers/2512.20598" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Date et al., 23 Dec 2025</a>, <a href="/papers/2506.05638" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Navarro et al., 5 Jun 2025</a>):</p> <ul> <li><strong>Right-maximal substrings</strong>: A substring$ χ$0 of $χ$1 is right-maximal if $χ$2 with $χ$3 such that both $χ$4 and $χ$5 occur in $χ$6.

Right-extensions: Set $χ$7.

Super-maximal extensions: $χ$8 is the set of those $χ$9 not a proper suffix of another member of $r$0; let $r$1.

Suffixient set: A position set $r$2 is suffixient if every $r$3 appears as a suffix of some $r$4 ($r$5).

Repetitiveness measure: $r$6, and, equivalently, $r$7)|$.

For infinite words, $r$ 8 is often mirrored by the exponent of repetition $r$ 9, defined as $χ$ 0, where $χ$ 1 is the minimal prefix length capturing all subwords of length $χ$ 2 (Sim, 2021, Watanabe, 2022).

2. Comparative Theory: $χ$ 3 within the Hierarchy of Repetitiveness

$χ$ 4 captures the number of fundamentally irreducible right-extensions, tightly reflecting the core repetitive complexity rather than mere substring diversity. Key relations include (Navarro et al., 5 Jun 2025, Date et al., 23 Dec 2025):

$χ$ $χ$ 5
- $χ$ 6: Substring-complexity lower bound.
- $χ$ 7: Size of smallest string-attractor.
- $χ$ 8: Count of runs in BWT( $χ$ 9).
- $\Sigma$ 0: Run count in BWT( $\Sigma$ 1), the reversal.
Separations: $\Sigma$ 2 is strictly below $\Sigma$ 3 (certain families have $\Sigma$ 4), and $\Sigma$ 5 is incomparable with measures based on copy-paste schemes (e.g., LZ77 $\Sigma$ 6, lexicographic parse size $\Sigma$ 7).
For ultra-repetitive episturmian strings, $\Sigma$ 8 for all $\Sigma$ 9 (Navarro et al., 5 Jun 2025).

This positioning of $\sigma$ 0 suggests it balances between capturing minimal decomposability and sharply reflecting the essential uniqueness of repeat patterns.

3. Bounds, Constructions, and Asymptotics: Tightness of $\sigma$ 1

The key universal upper bound $\sigma$ 2 was established by Navarro, Romana & Urbina (Date et al., 23 Dec 2025), with empirical results showing the bound is loose for small $\sigma$ 3:

General $\sigma$ 4-ary construction:
- Clustered-family $\sigma$ 5 for $\sigma$ 6 symbols gives $\sigma$ 7, $\sigma$ 8, hence $\sigma$ 9 as $w \in \Sigma^*$ 0.
Binary alphabet and de Bruijn sequences:
- Certain linear-feedback shift register (LFSR)-generated de Bruijn strings achieve $w \in \Sigma^*$ 1, $w \in \Sigma^*$ 2, so $w \in \Sigma^*$ 3 as $w \in \Sigma^*$ 4.
- Explicit examples for $w \in \Sigma^*$ 5 yield $w \in \Sigma^*$ 6.
General $w \in \Sigma^*$ 7-ary de Bruijn:
- For $w \in \Sigma^*$ 8, no de Bruijn-based construction can exceed $w \in \Sigma^*$ 9, so the $\$0 bound becomes unattainable.
Empirical real-data observations:
- For $\$1, genome datasets yield $\$2 (Date et al., 23 Dec 2025).

A plausible implication is that purely combinatorial constructions capture the worst-case extremal behavior, while in practical data $\$3 tends to be notably lower than $\$4.

4. Sensitivity and Stability under String Operations

The sensitivity of $\$5 to edit operations is crucial for indexing and pattern matching robustness (Navarro et al., 5 Jun 2025):

Additive sensitivity:
- Appending/prepending one character: $\$6, $\$7.
- Non-monotonicity: $\$8 may decrease after an append (example: $\$9 but $such that$ 0).
Multiplicative sensitivity:
- General insertions/substitutions/deletions: $such that$ 1 worst-case blow-up in binary de Bruijn sequences.
- Rotation: $such that$ 2 increase.
- Reversal: Can change $such that$ 3 by $such that$ 4 for some families.

Operation	Additive Sensitivity	Multiplicative Sensitivity
append/prepend	$such that$ 5 +2	constant
ins/sub/del (mid)	$such that$ 6	$such that$ 7
rotation	$such that$ 8	$such that$ 9
reversal	$%%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)$ 0	$%%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)$ 1

This suggests that while $%%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)$ 2 is stable under simple edits, it is not monotone and can change sharply under complex manipulations.

5. Algorithmic Computation and Indexing Applications

Efficient calculation of $%%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)$ 3 and construction of minimal suffixient sets are possible using linear-time procedures (Navarro et al., 5 Jun 2025):

Algorithms:
- Suffix tree/automaton augmented with right-extension counts.
- Suffix array + LCP + BWT scan.
- Output: List all super-maximal extensions, record their endpoint positions to derive a minimal suffixient set.
Applications:
- One-occurrence pattern search in $%%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)$ 4 time.
- Maximal-exact-match queries in $%%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)$ 5 time.
- Random-access compressed indexing in $%%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)$ 6 space, provided an efficient access mechanism.

A plausible implication is that $%%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)$ 7 serves as a compact summary for building efficient compressed indexes with specific pattern matching guarantees.

6. Infinite Words: The Exponent of Repetition and Dynamical Generalizations

For infinite words, the exponent of repetition $%%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)$ 8 plays an analogous role to $%%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)$ 9 (Sim, 2021, Watanabe, 2022):

Definition: $χ$ 00, with $χ$ 01 as the shortest prefix containing all $χ$ 02-length factors.
Sturmian words:
- $χ$ 03.
- Values depend on continued fraction expansion properties of rotation slope $χ$ 04.
- For the Fibonacci word, $χ$ 05.
Spectral gaps: The spectrum $χ$ 06 has maximal gaps and accumulation points precisely determined.
Quadratic irrationals: $χ$ 07 for characteristic Sturmian word $χ$ 08.
Invariance: $χ$ 09 for any suffix $χ$ 10 of $χ$ 11, and $χ$ 12 remains invariant across equivalent quadratic irrationals.

These results underscore the role of $χ$ 13 (and thus $χ$ 14) as a bridge between symbolic recurrence and Diophantine phenomena.

7. Open Problems and Research Directions

Current challenges and conjectures include (Navarro et al., 5 Jun 2025, Date et al., 23 Dec 2025):

Reachability: Is $χ$ 15 "reachable" for random access in $χ$ 16 space? The prevailing conjecture is negative.
Tighter relations: Whether $χ$ 17 universally, and if $χ$ 18 can achieve constant-factor sensitivity to all edits.
Algorithmic improvements: Whether linear-time computation of $χ$ 19 can be reduced to $χ$ 20 for highly repetitive inputs.
Extensions: Behavior of $χ$ 21 under complement, string splices, concatenations, and more sophisticated word operations.

This suggests an active research frontier concerning the algorithmic and combinatorial tractability of $χ$ 22, especially its interplay with other repetitiveness measures and its potential generalizations beyond presently characterized families.