Metaorder modelling and identification from public data
Published 23 Feb 2026 in q-fin.TR, cs.CE, q-fin.ST, and stat.CO | (2602.19590v1)
Abstract: Market-order flow in financial markets exhibits long-range correlations. This is a widely known stylised fact of financial markets. A popular hypothesis for this stylised fact comes from the Lillo-Mike-Farmer (LMF) order-splitting theory. However, quantitative tests of this theory have historically relied on proprietary datasets with trader identifiers, limiting reproducibility and cross-market validation. We show that the LMF theory can be validated using publicly available Johannesburg Stock Exchange (JSE) data by leveraging recently developed methods for reconstructing synthetic metaorders. We demonstrate the validation using 3 years of Transaction and Quote Data (TAQ) for the largest 100 stocks on the JSE when assuming that there are either N=50 or N=150 effective traders managing metaorders in the market.
The paper introduces a reconstruction algorithm that generates synthetic metaorders from public trade and quote data, robustly recovering the square-root price impact law.
The methodology confirms time-independence and reveals distinct in-execution concave impact and post-execution convex decay profiles consistent with LMF theory.
Empirical validation of the γ = α - 1 relation across different trader participation parameters underscores the utility of public data for market microstructure analysis.
Metaorder Modelling and Identification from Public Data: An Expert Review
Introduction
This paper addresses the long-standing challenge of empirically validating the Lillo-Mike-Farmer (LMF) order-splitting theory of market order flow using exclusively public data from limit order markets. Traditionally, validation of LMF theory has depended on proprietary datasets that contain trader identifiers, restricting reproducibility and market coverage. The authors leverage algorithmic reconstruction of synthetic metaorders—using only trade and quote data—to achieve robust recovery of stylized facts associated with metaorder-driven order flow. Empirical analysis is conducted using Johannesburg Stock Exchange (JSE) data from 2023 to 2025, encompassing the top 100 stocks by volume.
Synthetic Metaorder Generation from Public Data
The central methodological contribution is the systematic reconstruction of metaorders from public tick-level trades. Metaorders are constructed by mapping observed trades to synthetic traders in a manner that preserves temporal order and is informed by both homogeneous and power-law distributed trader participation assumptions. The assignment is realized via a mapping algorithm that controls for the number of synthetic traders (N) and the trader participation distribution, reflecting empirical regularities on real markets. The sequence of trades thus attributed to each trader is segmented into consecutive runs of the same sign, yielding metaorders suitable for further analysis.
The approach critically depends on the assumption that real-world trader activity is approximately power-law distributed, consistent with prior findings, though N and the exponent δ are not known ex ante. The authors assess the robustness of stylized facts across multiple parameterizations.
Figure 1: Log-log SQL impact curves for GRT and GFI (left) and for the aggregated top 100 JSE stocks (right) validating the square-root law with theoretical fits in red.
Validation of Stylized Facts
Square-Root Law
A fundamental stylized fact is that metaorder price impact scales as the square root of normalized order size: I(Q)∝Q/VD. This effect is precisely recovered both at the individual stock and aggregate level from synthetic metaorders, validating the SQL even in the absence of proprietary trader identifiers. The deviation from perfect scaling at extreme quantiles is attributable to noise, as in matched studies using real metaorder data.
Time Independence
The authors corroborate that impact is broadly time-independent with respect to metaorder length and execution duration, once again matching documented empirical regularities.
Execution and Decay Profiles
The dynamic price impact profile during metaorder execution is shown to be concave, with the scaling exponent γ2 typically less than unity. However, marginal discrepancies from the theoretical expectation (i.e., γ2=0.5 for perfect square-root law) are detected and appear to depend on trader participation parameterization, particularly impacting highly liquid stocks (GFI).
Figure 2: Concave in-execution impact profiles for GFI and GRT—red lines indicate best-fit curves—demonstrating stylized nonlinearity of intra-metaorder effects.
Post-execution, the convex decay of impact is observed, recoverable with a function parametrized by the decay exponent β. The empirical values for β align well with prior results utilizing both synthetic and real metaorders.
Figure 3: Convex post-execution impact decay for GRT as a function of rescaled time, closely conforming to established theoretical forms.
Metaorder Realism
The synthetic metaorders reconstructed using the published method robustly exhibit all key stylized facts: SQL, time-independence, concave execution, and convex decay. The statistical fidelity to ground-truth characteristics supports their validity for downstream testing of the LMF theory.
Empirical Validation of the LMF Theory
The LMF theory posits that long-memory (power-law autocorrelations) in market order flow emerges from stochastic order-splitting by traders. Specifically, it predicts the universal microscopic-macroscopic relation γ=α−1, where α is the power-law exponent for metaorder length and γ characterizes the decay of autocorrelations in trade sign sequences.
By reconstructing metaorders on JSE data for different N (number of effective traders), the authors empirically fit the distributions and directly recover the predicted γ=α−1 relationship for scenarios with N=50 and N=150 using power-law participation. Empirical box plots of γ versus α−1 confirm that the data points cluster near the theoretical line, confirming the explanatory sufficiency of the LMF mechanism for long-memory in order flow even in "identifier-less", reconstructed settings.
Figure 4: Empirical recovery of γ=α−1 on JSE data for N=50 effective traders—orange lines denote bin medians underlying the LMF relation.
Discussion and Theoretical Implications
Demonstrating that stylized facts and the core prediction of LMF theory remain intact under synthetic metaorder reconstruction significantly broadens the analyzability of financial microstructure. This result suggests that access to proprietary trade identifiers, long a prerequisite for microstructure research, is no longer necessary for robust statistical validation of order-splitting-induced long-memory phenomena, provided structurally sound reconstruction algorithms are utilized.
These findings also have important implications for agent-based modeling. The authors note congruence with recent agent-based models in which metaorders correspond to strategic TWAP/participation tactics by reinforcement learning agents. Furthermore, the robustness of price impact functional forms and autocorrelation decay to synthetic metaorder construction validates the generalizability of these models and underlines the centrality of metaorder flow to emergent market properties.
Future Directions
Future work should focus on more sophisticated parameter inference for the metaorder reconstruction process, e.g., joint estimation of effective N and the exponent δ via maximum-likelihood. Additionally, cross-market validation beyond the JSE—potentially extending to fragmented or highly electronic markets—would further elucidate universality claims. From a practical perspective, these results enable rigorous regulatory and infrastructure analytics using only public (Level 1 or better) data, with implications for transaction cost analysis, transparency calibration, and surveillance.
Conclusion
This paper provides authoritative empirical evidence that LMF-type microstructure phenomena—long-memory in order flow linked directly to metaorder size distributions—are recoverable from reconstructed metaorders using only public trade and quote data. The synthetic metaorders satisfy all major microstructure stylized facts and enable precise estimation of key theoretical parameters, facilitating robust testing of universal relations without the necessity for proprietary trader identification. The research advances both methodology and theory in market microstructure analysis and lowers the barrier for reproducible, cross-market statistical studies.
Reference: "Metaorder modelling and identification from public data" (2602.19590)