Papers
Topics
Authors
Recent
Search
2000 character limit reached

WebSplatter: Linear-Time Composite Sorting

Updated 4 February 2026
  • WebSplatter is a linear-time sorting algorithm for finite-width tree-structured orders, converting composite keys into lexicographic byte strings.
  • It employs 'nextification' to transform nested and variable-length keys into uniformly comparable formats, facilitating efficient MSD radix-sort.
  • Applications include hierarchical SQL keys, arbitrary-precision numbers, and custom tuple structures, achieving significant performance gains over traditional sorts.

WebSplatter is a class of sorting algorithms that achieve linear time complexity for a broad family of hierarchically defined orders, especially orders common in database, string, and numerical applications. The core contribution is an efficient reduction—termed "nextification"—of any finite-width tree-structured order to lexicographic order on byte strings, enabling the use of well-understood linear-time radix sort primitives. The method handles deeply nested or variable-length keys arising from lexicographic, hierarchic (length-then-lex), sum (union), and inversion constructs. This universality results in a single, stable, linear-time sorting algorithm for virtually any practical composite key encountered in software contexts involving ORDER BY clauses, arbitrary-precision numbers, or custom tuple structures (Lyaudet, 2018).

1. Finite-Width Tree-Structured Orders

The foundation of WebSplatter is the class of finite-width tree-structured orders ("TSOs"). TSOs are built from finite total orders by a finite sequence of operations: inversion, lexicographic (dictionary) product, hierarchic (length-then-lex) product (also known as "shortlex" or "radix"), and finite generalized sum (union). Given any finite total order OO, its inverse $\Inv(O)$, and sequences of tree-structured orders O=(O0,O1,,Od1)\mathcal{O} = (O_0, O_1, \dots, O_{d-1}) that are finite or ultimately periodic, one constructs:

  • $\Lex(i, j, \mathcal{O})$: Lexicographic order on sequences.
  • $\Hierar(i, j, \mathcal{O})$: Hierarchic or shortlex order, comparing first by sequence length then lexicographically.
  • Generalized sums xOmf(x)\sum_{x \in O_m} f(x), where f:Om{tree orders}f : O_m \to \{\text{tree orders}\}, impose master/suborder branching.

The finite-width restriction requires that all such sequences O\mathcal{O} are finitely described (eventually periodic), and that any master sum index OmO_m is finite. These constraints ensure that hierarchical database keys, multi-field tuples, variable-length integer and string encodings, and SQL ORDER BY constructs are all encompassed by the model.

2. Nextification: Transformation to Lexicographic Order

Nextification is the transformation that enables TSOs to be sorted uniformly and efficiently. The process converts any finite-width TSO instance into a byte string such that the original order is preserved under lexicographic comparison. Each leaf (finite order) is assigned a fixed-length binary code of log2k\lceil \log_2 k \rceil bits for kk elements. Internal nodes operate as follows:

  • Lexicographic nodes concatenate their children's codes.
  • Contre-lex nodes concatenate and adjust a small padding counter.
  • Hierarchic nodes prefix a unary-encoded child count and concatenate.
  • Inversion nodes flip bits to invert order.

This encoding requires time proportional to the sum of children's code lengths. The total code length for any datum is bounded by the number of leaves plus a constant per internal node; overall, nextification requires O(n)O(n) time and space for nn records, with practical constant factors (3\approx 3 for real-world orders).

3. Linear-Time Sorting via Hierarchical Radix Sort

After nextification, sorting reduces to lexicographic ordering of the resulting byte strings. The algorithm uses MSD radix-sort (most-significant-digit first), which partitions the dataset at each byte position and recursively sorts only the relevant buckets. The process is stable due to the construction of prefix sums for bucket boundaries. The total work is O(n+w)O(n + w) for nn keys of maximal "nextified" length ww, with Tradix(n)=O(n)T_{\rm radix}(n) = O(n) and Spaceradix(n)=O(n)\mathrm{Space}_{\rm radix}(n) = O(n) when ww is O(1)O(1) or O(logn)O(\log n) for variable-length keys.

4. Complexity Analysis

Nextification and MSD radix-sort compose to achieve Ttotal(n)=O(n)T_{\rm total}(n) = O(n) total time and O(n)O(n) space in the RAM model, where pointer arithmetic and byte operations are O(1)O(1). Typical constant factors are c13c_1 \approx 3 (nextification overhead) and c21c_2 \approx 1 (radix passes), so the overall cost is (c1+c2)n(c_1 + c_2) n. Space overhead includes a constant-factor increase in key representation and O(n)O(n) for auxiliary storage.

Step Time Complexity Space Complexity
Nextification O(n)O(n) O(n)O(n)
Radix sort O(n)O(n) O(n)O(n)
Total O(n)O(n) O(key sizen)O(\text{key size} \cdot n)

5. Representative Applications

a) Unbounded Integer Sort:

Arbitrary-precision integers are encoded by digit sequences; primary comparison is by length (hierarchic node), followed by digit-wise comparison. Nextification converts each integer to a shortlex-padded string, leading to O(n)O(n) total sorting time.

b) Hierarchical SQL Keys:

For SQL queries like ORDER BY country ASC, city DESC, street ASC, lexicographic and inverted nodes model the key order: $\Lex(0, 3, \{O^{country}, \Inv(O^{city}), O^{street}\})$ Nextification inverts the city field, concatenates segments, and adjusts padding; a single MSD radix-sort then suffices.

c) Sorting Rationals by Continued Fraction:

Any nonnegative rational p/qp/q has a finite continued fraction [n0;n1,...,nk][n_0; n_1, ..., n_k]. Alternating lex/contre-lex nodes replicate rational order, and the nextification plus radix-sort mechanism applies. However, continued-fraction expansion may not be linear without fast multi-precision arithmetic.

6. Comparative Evaluation

Standard comparison-based sorts (mergesort, quicksort) require O(nlogn)O(n \log n) comparisons, with each comparison potentially traversing all tuple fields: O(nlognw)O(n \log n \cdot w). LSD radix sort achieves O(n)O(n) only for uniform, fixed-length keys. In contrast, WebSplatter handles variable-length, mixed-key structures uniformly in O(nw)O(n w), which, for most applications, outperforms O(nlogn)O(n \log n).

MSD radix-sort alone copes with variable lengths but requires explicit stack control; by leveraging padded nextified strings, end-of-string handling is implicit (zero-byte as end marker). Benchmarks show that this method can outperform standard comparators (such as qsort) by factors of 2–10 for large nn, even with nextification overhead (Lyaudet, 2018).

7. Broader Implications

Hierarchical radix sort via nextification provides an algorithmic unification for sorting any composite or structured key expressible as a finite-width TSO. No custom comparator is required; ORDER BY or tuple-comparison expressions are compiled into a tree structure, then nextified and sorted by one stable, linear-time algorithm. This suggests significant simplification for implementations in database engines, arbitrary-precision arithmetic, and applications involving complex key definition, with uniformly strong asymptotics and practically efficient constants.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to WebSplatter.