Kalshi-based Prediction Market Episodes

Updated 3 February 2026

Kalshi-based episodes are a suite of trading environment reconstructions derived from real Kalshi market data, offering deterministic, event-driven simulations.
They integrate orderbook, trade, and lifecycle data using precise timestamp ordering to emulate realistic market microstructure and fee structures.
This platform enables researchers to rigorously backtest trading strategies and assess performance under transaction costs, settlement constraints, and volatile conditions.

Kalshi-based episodes are a standardized suite of trading environment reconstructions derived from the Kalshi CFTC-regulated U.S. prediction market, as implemented in the PredictionMarketBench framework. These episodes offer a deterministic, event-driven platform for evaluating algorithmic and LLM-based trading agents using replayed historical limit-order-book and trade data. Each episode encapsulates a distinct prediction market context—spanning cryptocurrency, weather, and sports—for the systematic backtesting of agent behaviors under realistic market microstructure, transaction cost, and settlement constraints (Arora et al., 28 Jan 2026).

1. Overview of Kalshi-based Episodes

The four Kalshi-based episodes are constructed directly from raw Kalshi market data streams, with sampling concentrated in January 2026. Each episode is characterized by its prediction domain, ticker structure, and event window:

Episode ID	Domain	Tickers	Duration	OB snaps	Trades
KXBTCD-26JAN2017	Crypto	23	37.4 h	311,998	6,283
KXHIGHNY-26JAN20	Weather	6	37.4 h	50,231	8,044
KXNCAAF-26	Sports (CFB)	2	37.4 h	8,320	171,786
KXNFLGAME-26JAN11BUFJAC	Sports (NFL)	2	67.4 h	8,047	111,160

KXBTCD-26JAN2017: Bitcoin daily high threshold prediction with 23 YES/NO contracts (“Did BTC close above

X?”) in a ≈37.4 hour window.</li> <li><strong>KXHIGHNY-26JAN20</strong>: NYC weather episode with 6 discrete high-temperature thresholds (“Will NYC high exceed T°?”), spanning the same interval.</li> <li><strong>KXNCAAF-26</strong>: College Football season-long futures with 2 championship outcome tickers; same time span, but extremely high trade count.</li> <li><strong>KXNFLGAME-26JAN11BUFJAC</strong>: NFL single-game spread bet (Buffalo vs. Jacksonville) over ≈67.4 hours.</li> </ul> <p>Each episode encompasses raw orderbook updates, trade prints, and settlement streams for its associated tickers. All content is pre-sharded by event identifier and organized under an episode directory structure (metadata.json, orderbook.parquet, trades.parquet, settlement.json).</p> <h2 class='paper-heading' id='data-extraction-and-state-construction'>2. Data Extraction and State Construction</h2> <p>Episodes are built from three market data streams—orderbook updates, trades, and lifecycle events—aligned to a global UTC timestamp and further disambiguated by sequence numbers. The episode’s state at each agent decision time

(with agent cadence

\Delta t

, e.g., 5 minutes) comprises for each ticker

:</p> <ul> <li>

\mathrm{best\_bid}_i(t)

,

\mathrm{best\_ask}_i(t)

</li> <li>Mid-price:

m_i(t) = \frac{\mathrm{best\_bid}_i(t) + \mathrm{best\_ask}_i(t)}{2}

</li> <li>Top-

orderbook level depths: volumes at adjacent ticks around <a href="https://www.emergentmind.com/topics/bombardier-beetle-optimizer-bbo" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">BBO</a></li> <li>History vector: last

mid-price values

[m_i(t-\Delta t\cdot j)]_{j=1}^M

</li> <li>Agent’s open positions

\mathrm{pos}_i(t)

, cash balance

\mathrm{cash}(t)

</li> <li>List of active, unfilled orders (ID, side, price, size, time-in-force)</li> </ul> <p>Feature engineering supports derived metrics such as:</p> <ul> <li>Simple moving average (SMA) of mid-price over window

:</li> </ul> <p>

\mathrm{SMA}_i(t) = \frac{1}{W}\sum_{j=0}^{W-1} m_i(t-j\Delta t)

</p> <ul> <li>Rolling standard deviation

\sigma_i(t)

over the same window.</li> </ul> <p>Raw data is presented as Parquet and JSON files. Timestamps and sequence IDs ensure unambiguous global event ordering.</p> <h2 class='paper-heading' id='agent-actions-execution-and-reward-structure'>3. Agent Actions, Execution, and Reward Structure</h2> <p>The action interface (via AgentContext <a href="https://www.emergentmind.com/topics/geospatial-application-programming-interface-api" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">API</a>) exposes:</p> <ul> <li><code>submit_limit_order(ticker, side∈{BUY,SELL}, price, size, tif∈{IOC, GTC, POST_ONLY})</code></li> <li><code>submit_market_order(ticker, side, size)</code></li> <li><code>cancel_order(order_id)</code></li> </ul> <p>Execution uses maker/taker fee modeling:</p> <ul> <li>Taker fee

(p)

(cents):

f_{\mathrm{taker}}(p) = 0.07 \times p \times (1-p/100)

</li> <li>Maker fee

(p)

:

f_{\mathrm{maker}}(p) = 0.0175 \times p \times (1-p/100)

</li> <li>Transaction cost for size

at price

:</li> </ul> <p>

\mathrm{Cost} = f_{\mathrm{role}}(p) \times p \times Q

</p> <p>with

\mathrm{role} \in \{\text{maker}, \text{taker}\}

</p> <p>Reward at each timestep is:</p> <p>

r_t = \sum_i (pos_i(t) - pos_i(t-\Delta t))\, m_i(t) - \mathrm{fees}_t

</p> <p>At terminal settlement, all open positions are settled at outcome

o_i\in\{0,1\}

with:</p> <p>

\mathrm{SettlementPL} = \sum_i pos_i(T)\, (o_i - m_i(T^-))

</p> <p>Total episodic reward is

\sum_t r_t + \mathrm{SettlementPL}$ and incorporates both market-to-market P&L, transaction costs, and settlement corrections.</p> <h2 class='paper-heading' id='deterministic-replay-and-simulation-pipeline'>4. Deterministic Replay and Simulation Pipeline</h2> <p>The environment employs a deterministic, event-driven simulator to ensure reproducibility and fair comparison across trading agents. The canonical replay pipeline executes as follows:</p> <p>

for each episode_dir in episodes:
    meta = load(metadata.json)
    OB_events = stream(orderbook.parquet)
    Trade_events = stream(trades.parquet)
    Life_events = stream(settlement.json + lifecycle info)
    E = merge_and_sort([OB_events, Trade_events, Life_events], key=(timestamp, sequence_number))
    sim = Simulator(meta.fee_model, meta.execution_mode)
    current_time = meta.start_time
    event_ptr = 0

    while not sim.settlement_processed():
        next_decision = current_time + meta.cadence
        while E[event_ptr].timestamp <= next_decision:
            sim.process_event(E[event_ptr])
            event_ptr += 1
        obs = sim.get_observation(current_time=next_decision)
        actions = agent.step(obs)
        sim.apply_actions(actions)
        current_time = next_decision

    while event_ptr < len(E):
        sim.process_event(E[event_ptr])
        event_ptr += 1
    sim.close_all_positions()

</p> <p>All episodic data is strictly partitioned by event identifier. Sequence numbers resolve any tie in event timestamps. This design enables precise event ordering, strict replay determinism, and supports both classical and tool-calling <a href="https://www.emergentmind.com/topics/llm-agents" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">LLM agents</a> with reproducible trajectories.</p> <h2 class='paper-heading' id='usage-api-access-and-key-statistics'>5. Usage, API Access, and Key Statistics</h2> <p>Researchers interact with Kalshi-based episodes programmatically via the PredictionMarketBench Python API. Core workflow:</p> <p>

from predictionmarketbench import BenchmarkHarness, AgentContext, Simulator

episodes = BenchmarkHarness.list_episodes()  # ['KXBTCD-...', ...]
epi = BenchmarkHarness.load_episode('KXBTCD-26JAN2017')  # loads all data
sim = Simulator(fee_model=epi.metadata.fee_model, execution_mode=epi.metadata.execution_mode)
ctx = AgentContext(sim)

t = epi.metadata.start_time
while not sim.settlement_processed():
    t_next = t + epi.metadata.agent_cadence
    sim.replay_until(t_next)
    obs = ctx.get_observation(time=t_next)  # dict: market/bbo/depth/pos/cash
    actions = my_agent.policy(obs)
    for a in actions:
        ctx.place_order(**a)
    t = t_next

pnl, trade_log, equity_curve = sim.get_results()

</p> <p>Observations are Python dicts mapping tickers to current quotes, depth arrays, positions, and cash. Actions are lists of dicts specifying ticker, side, order type, price, size, and TIF. The simulator outputs deterministic logs, timestamped fills, transactional fees, and detailed P&L records for reproduction and offline analysis.</p> <p>Key statistics for each episode—duration, orderbook snapshots, trade volume, and ticker count—are summarized above. Decision steps per episode are proportional to duration and agent cadence (e.g., 37.4h at 5min → ≈448 steps). Aggregate volatility can be computed via $\sigma_{\mathrm{episode}} = \mathrm{std}(\Delta m_i(t))$ as an offline metric.

6. Research Implications and Observed Dynamics

The standardized Kalshi-based episodes offer a unique backtesting corpus with fee and settlement mechanisms characteristic of real prediction markets. Baseline analyses demonstrate that naive trading agents can underperform due to cumulative transaction costs and adverse settlement effects, while algorithmic, fee-aware agents display robustness in volatile regimes (Arora et al., 28 Jan 2026). This property highlights the critical influence of microstructure and execution modeling in algorithmic market design and validation. A plausible implication is that agents relying solely on directional signal without transaction cost modeling will systematically underperform relative to microstructure-sensitive strategies.

The tool supports studies into agent adaptivity, liquidity provision, settlement risk management, and the development of testable, reproducible results across artificial and learned agent classes. The strict replay determinism and event-partitioned design of Kalshi-based episodes represent a methodological advance aligning with best practices in empirical market microstructure and reinforcement learning benchmark design.

Markdown Report Issue Upgrade to Chat

References (1)

PredictionMarketBench: A SWE-bench-Style Framework for Backtesting Trading Agents on Prediction Markets (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Kalshi-based Episodes.