Papers
Topics
Authors
Recent
Search
2000 character limit reached

Repetitiveness Measure χ in String Combinatorics

Updated 30 December 2025
  • Repetitiveness Measure χ is a combinatorial invariant defined via minimal suffixient sets that capture all irreducible right-extensions of a string.
  • It bridges theoretical string metrics and practical applications, linking its value to the number of runs in the Burrows–Wheeler Transform and compressed indexing.
  • Efficient linear-time algorithms compute χ, although its sensitivity varies with operations like appending, rotations, and reversals.

The repetitiveness measure χχ is a combinatorial invariant quantifying the essential repetitive structure of a string or infinite word. Defined through the minimal size of a suffixient set—which characterizes the placement of all irreducible right-extensions—χχ acts as a central metric in the hierarchy of repetitiveness measures, especially in the context of compressed string indexes and combinatorial word theory. Its relationship to the number of runs rr in the Burrows–Wheeler Transform (BWT) and to other classical measures underpins both structural analysis and algorithmic applications in stringology, symbolic dynamics, and Diophantine approximation.

1. Formal Definitions: Suffixient Sets and Repetitiveness Measure χχ

Let Σ\Sigma be a finite ordered alphabet of size σ\sigma, and let wΣw \in \Sigma^* be a finite string, terminated by an endmarker $\$$such that$\$%%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)isdefinedviathefollowingconstructs(<ahref="/papers/2512.20598"title=""rel="nofollow"dataturbo="false"class="assistantlink"xdataxtooltip.raw="">Dateetal.,23Dec2025</a>,<ahref="/papers/2506.05638"title=""rel="nofollow"dataturbo="false"class="assistantlink"xdataxtooltip.raw="">Navarroetal.,5Jun2025</a>):</p><ul><li><strong>Rightmaximalsubstrings</strong>:Asubstring is defined via the following constructs (<a href="/papers/2512.20598" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Date et al., 23 Dec 2025</a>, <a href="/papers/2506.05638" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Navarro et al., 5 Jun 2025</a>):</p> <ul> <li><strong>Right-maximal substrings</strong>: A substring χ$0 of $χ$1 is right-maximal if $χ$2 with $χ$3 such that both $χ$4 and $χ$5 occur in $χ$6.

  • Right-extensions: Set $χ$7.
  • Super-maximal extensions: $χ$8 is the set of those $χ$9 not a proper suffix of another member of $r$0; let $r$1.
  • Suffixient set: A position set $r$2 is suffixient if every $r$3 appears as a suffix of some $r$4 ($r$5).
  • Repetitiveness measure: $r$6, and, equivalently, $r$7)|$.
  • For infinite words, rr8 is often mirrored by the exponent of repetition rr9, defined as χχ0, where χχ1 is the minimal prefix length capturing all subwords of length χχ2 (Sim, 2021, Watanabe, 2022).

    2. Comparative Theory: χχ3 within the Hierarchy of Repetitiveness

    χχ4 captures the number of fundamentally irreducible right-extensions, tightly reflecting the core repetitive complexity rather than mere substring diversity. Key relations include (Navarro et al., 5 Jun 2025, Date et al., 23 Dec 2025):

    • χχ5
      • χχ6: Substring-complexity lower bound.
      • χχ7: Size of smallest string-attractor.
      • χχ8: Count of runs in BWT(χχ9).
      • Σ\Sigma0: Run count in BWT(Σ\Sigma1), the reversal.
    • Separations: Σ\Sigma2 is strictly below Σ\Sigma3 (certain families have Σ\Sigma4), and Σ\Sigma5 is incomparable with measures based on copy-paste schemes (e.g., LZ77 Σ\Sigma6, lexicographic parse size Σ\Sigma7).
    • For ultra-repetitive episturmian strings, Σ\Sigma8 for all Σ\Sigma9 (Navarro et al., 5 Jun 2025).

    This positioning of σ\sigma0 suggests it balances between capturing minimal decomposability and sharply reflecting the essential uniqueness of repeat patterns.

    3. Bounds, Constructions, and Asymptotics: Tightness of σ\sigma1

    The key universal upper bound σ\sigma2 was established by Navarro, Romana & Urbina (Date et al., 23 Dec 2025), with empirical results showing the bound is loose for small σ\sigma3:

    • General σ\sigma4-ary construction:
      • Clustered-family σ\sigma5 for σ\sigma6 symbols gives σ\sigma7, σ\sigma8, hence σ\sigma9 as wΣw \in \Sigma^*0.
    • Binary alphabet and de Bruijn sequences:
      • Certain linear-feedback shift register (LFSR)-generated de Bruijn strings achieve wΣw \in \Sigma^*1, wΣw \in \Sigma^*2, so wΣw \in \Sigma^*3 as wΣw \in \Sigma^*4.
      • Explicit examples for wΣw \in \Sigma^*5 yield wΣw \in \Sigma^*6.
    • General wΣw \in \Sigma^*7-ary de Bruijn:
      • For wΣw \in \Sigma^*8, no de Bruijn-based construction can exceed wΣw \in \Sigma^*9, so the $\$0 bound becomes unattainable.
    • Empirical real-data observations:

    A plausible implication is that purely combinatorial constructions capture the worst-case extremal behavior, while in practical data $\$3 tends to be notably lower than $\$4.

    4. Sensitivity and Stability under String Operations

    The sensitivity of $\$5 to edit operations is crucial for indexing and pattern matching robustness (Navarro et al., 5 Jun 2025):

    • Additive sensitivity:
      • Appending/prepending one character: $\$6, $\$7.
      • Non-monotonicity: $\$8 may decrease after an append (example: $\$9 but suchthatsuch that0).
    • Multiplicative sensitivity:
      • General insertions/substitutions/deletions: suchthatsuch that1 worst-case blow-up in binary de Bruijn sequences.
      • Rotation: suchthatsuch that2 increase.
      • Reversal: Can change suchthatsuch that3 by suchthatsuch that4 for some families.
    Operation Additive Sensitivity Multiplicative Sensitivity
    append/prepend suchthatsuch that5 +2 constant
    ins/sub/del (mid) suchthatsuch that6 suchthatsuch that7
    rotation suchthatsuch that8 suchthatsuch that9
    reversal %%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)0 %%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)1

    This suggests that while %%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)2 is stable under simple edits, it is not monotone and can change sharply under complex manipulations.

    5. Algorithmic Computation and Indexing Applications

    Efficient calculation of %%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)3 and construction of minimal suffixient sets are possible using linear-time procedures (Navarro et al., 5 Jun 2025):

    • Algorithms:
      • Suffix tree/automaton augmented with right-extension counts.
      • Suffix array + LCP + BWT scan.
      • Output: List all super-maximal extensions, record their endpoint positions to derive a minimal suffixient set.
    • Applications:
      • One-occurrence pattern search in %%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)4 time.
      • Maximal-exact-match queries in %%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)5 time.
      • Random-access compressed indexing in %%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)6 space, provided an efficient access mechanism.

    A plausible implication is that %%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)7 serves as a compact summary for building efficient compressed indexes with specific pattern matching guarantees.

    6. Infinite Words: The Exponent of Repetition and Dynamical Generalizations

    For infinite words, the exponent of repetition %%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)8 plays an analogous role to %%%%9%%%%a \in \Sigma%%%%10%%%%χ(w)9 (Sim, 2021, Watanabe, 2022):

    • Definition: χχ00, with χχ01 as the shortest prefix containing all χχ02-length factors.
    • Sturmian words:
      • χχ03.
      • Values depend on continued fraction expansion properties of rotation slope χχ04.
      • For the Fibonacci word, χχ05.
    • Spectral gaps: The spectrum χχ06 has maximal gaps and accumulation points precisely determined.
    • Quadratic irrationals: χχ07 for characteristic Sturmian word χχ08.
    • Invariance: χχ09 for any suffix χχ10 of χχ11, and χχ12 remains invariant across equivalent quadratic irrationals.

    These results underscore the role of χχ13 (and thus χχ14) as a bridge between symbolic recurrence and Diophantine phenomena.

    7. Open Problems and Research Directions

    Current challenges and conjectures include (Navarro et al., 5 Jun 2025, Date et al., 23 Dec 2025):

    • Reachability: Is χχ15 "reachable" for random access in χχ16 space? The prevailing conjecture is negative.
    • Tighter relations: Whether χχ17 universally, and if χχ18 can achieve constant-factor sensitivity to all edits.
    • Algorithmic improvements: Whether linear-time computation of χχ19 can be reduced to χχ20 for highly repetitive inputs.
    • Extensions: Behavior of χχ21 under complement, string splices, concatenations, and more sophisticated word operations.

    This suggests an active research frontier concerning the algorithmic and combinatorial tractability of χχ22, especially its interplay with other repetitiveness measures and its potential generalizations beyond presently characterized families.

    Topic to Video (Beta)

    No one has generated a video about this topic yet.

    Whiteboard

    No one has generated a whiteboard explanation for this topic yet.

    Follow Topic

    Get notified by email when new papers are published related to Repetitiveness Measure $χ$.