Papers
Topics
Authors
Recent
Search
2000 character limit reached

Kalshi-based Prediction Market Episodes

Updated 3 February 2026
  • Kalshi-based episodes are a suite of trading environment reconstructions derived from real Kalshi market data, offering deterministic, event-driven simulations.
  • They integrate orderbook, trade, and lifecycle data using precise timestamp ordering to emulate realistic market microstructure and fee structures.
  • This platform enables researchers to rigorously backtest trading strategies and assess performance under transaction costs, settlement constraints, and volatile conditions.

Kalshi-based episodes are a standardized suite of trading environment reconstructions derived from the Kalshi CFTC-regulated U.S. prediction market, as implemented in the PredictionMarketBench framework. These episodes offer a deterministic, event-driven platform for evaluating algorithmic and LLM-based trading agents using replayed historical limit-order-book and trade data. Each episode encapsulates a distinct prediction market context—spanning cryptocurrency, weather, and sports—for the systematic backtesting of agent behaviors under realistic market microstructure, transaction cost, and settlement constraints (Arora et al., 28 Jan 2026).

1. Overview of Kalshi-based Episodes

The four Kalshi-based episodes are constructed directly from raw Kalshi market data streams, with sampling concentrated in January 2026. Each episode is characterized by its prediction domain, ticker structure, and event window:

Episode ID Domain Tickers Duration OB snaps Trades
KXBTCD-26JAN2017 Crypto 23 37.4 h 311,998 6,283
KXHIGHNY-26JAN20 Weather 6 37.4 h 50,231 8,044
KXNCAAF-26 Sports (CFB) 2 37.4 h 8,320 171,786
KXNFLGAME-26JAN11BUFJAC Sports (NFL) 2 67.4 h 8,047 111,160
  • KXBTCD-26JAN2017: Bitcoin daily high threshold prediction with 23 YES/NO contracts (“Did BTC close above X?)ina37.4hourwindow.</li><li><strong>KXHIGHNY26JAN20</strong>:NYCweatherepisodewith6discretehightemperaturethresholds(WillNYChighexceedT°?),spanningthesameinterval.</li><li><strong>KXNCAAF26</strong>:CollegeFootballseasonlongfutureswith2championshipoutcometickers;sametimespan,butextremelyhightradecount.</li><li><strong>KXNFLGAME26JAN11BUFJAC</strong>:NFLsinglegamespreadbet(Buffalovs.Jacksonville)over67.4hours.</li></ul><p>Eachepisodeencompassesraworderbookupdates,tradeprints,andsettlementstreamsforitsassociatedtickers.Allcontentispreshardedbyeventidentifierandorganizedunderanepisodedirectorystructure(metadata.json,orderbook.parquet,trades.parquet,settlement.json).</p><h2class=paperheadingid=dataextractionandstateconstruction>2.DataExtractionandStateConstruction</h2><p>Episodesarebuiltfromthreemarketdatastreamsorderbookupdates,trades,andlifecycleeventsalignedtoaglobalUTCtimestampandfurtherdisambiguatedbysequencenumbers.TheepisodesstateateachagentdecisiontimeX?”) in a ≈37.4 hour window.</li> <li><strong>KXHIGHNY-26JAN20</strong>: NYC weather episode with 6 discrete high-temperature thresholds (“Will NYC high exceed T°?”), spanning the same interval.</li> <li><strong>KXNCAAF-26</strong>: College Football season-long futures with 2 championship outcome tickers; same time span, but extremely high trade count.</li> <li><strong>KXNFLGAME-26JAN11BUFJAC</strong>: NFL single-game spread bet (Buffalo vs. Jacksonville) over ≈67.4 hours.</li> </ul> <p>Each episode encompasses raw orderbook updates, trade prints, and settlement streams for its associated tickers. All content is pre-sharded by event identifier and organized under an episode directory structure (metadata.json, orderbook.parquet, trades.parquet, settlement.json).</p> <h2 class='paper-heading' id='data-extraction-and-state-construction'>2. Data Extraction and State Construction</h2> <p>Episodes are built from three market data streams—orderbook updates, trades, and lifecycle events—aligned to a global UTC timestamp and further disambiguated by sequence numbers. The episode’s state at each agent decision time t(withagentcadence (with agent cadence \Delta t,e.g.,5minutes)comprisesforeachticker, e.g., 5 minutes) comprises for each ticker i:</p><ul><li>:</p> <ul> <li>\mathrm{best\_bid}_i(t),, \mathrm{best\_ask}_i(t)</li><li>Midprice:</li> <li>Mid-price: m_i(t) = \frac{\mathrm{best\_bid}_i(t) + \mathrm{best\_ask}_i(t)}{2}</li><li>Top</li> <li>Top-Norderbookleveldepths:volumesatadjacentticksaround<ahref="https://www.emergentmind.com/topics/bombardierbeetleoptimizerbbo"title=""rel="nofollow"dataturbo="false"class="assistantlink"xdataxtooltip.raw="">BBO</a></li><li>Historyvector:last orderbook level depths: volumes at adjacent ticks around <a href="https://www.emergentmind.com/topics/bombardier-beetle-optimizer-bbo" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">BBO</a></li> <li>History vector: last Mmidpricevalues mid-price values [m_i(t-\Delta t\cdot j)]_{j=1}^M</li><li>Agentsopenpositions</li> <li>Agent’s open positions \mathrm{pos}_i(t),cashbalance, cash balance \mathrm{cash}(t)</li><li>Listofactive,unfilledorders(ID,side,price,size,timeinforce)</li></ul><p>Featureengineeringsupportsderivedmetricssuchas:</p><ul><li>Simplemovingaverage(SMA)ofmidpriceoverwindow</li> <li>List of active, unfilled orders (ID, side, price, size, time-in-force)</li> </ul> <p>Feature engineering supports derived metrics such as:</p> <ul> <li>Simple moving average (SMA) of mid-price over window W:</li></ul><p>:</li> </ul> <p>\mathrm{SMA}_i(t) = \frac{1}{W}\sum_{j=0}^{W-1} m_i(t-j\Delta t)</p><ul><li>Rollingstandarddeviation</p> <ul> <li>Rolling standard deviation \sigma_i(t)overthesamewindow.</li></ul><p>RawdataispresentedasParquetandJSONfiles.TimestampsandsequenceIDsensureunambiguousglobaleventordering.</p><h2class=paperheadingid=agentactionsexecutionandrewardstructure>3.AgentActions,Execution,andRewardStructure</h2><p>Theactioninterface(viaAgentContext<ahref="https://www.emergentmind.com/topics/geospatialapplicationprogramminginterfaceapi"title=""rel="nofollow"dataturbo="false"class="assistantlink"xdataxtooltip.raw="">API</a>)exposes:</p><ul><li><code>submitlimitorder(ticker,sideBUY,SELL,price,size,tifIOC,GTC,POSTONLY)</code></li><li><code>submitmarketorder(ticker,side,size)</code></li><li><code>cancelorder(orderid)</code></li></ul><p>Executionusesmaker/takerfeemodeling:</p><ul><li>Takerfee over the same window.</li> </ul> <p>Raw data is presented as Parquet and JSON files. Timestamps and sequence IDs ensure unambiguous global event ordering.</p> <h2 class='paper-heading' id='agent-actions-execution-and-reward-structure'>3. Agent Actions, Execution, and Reward Structure</h2> <p>The action interface (via AgentContext <a href="https://www.emergentmind.com/topics/geospatial-application-programming-interface-api" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">API</a>) exposes:</p> <ul> <li><code>submit_limit_order(ticker, side∈{BUY,SELL}, price, size, tif∈{IOC, GTC, POST_ONLY})</code></li> <li><code>submit_market_order(ticker, side, size)</code></li> <li><code>cancel_order(order_id)</code></li> </ul> <p>Execution uses maker/taker fee modeling:</p> <ul> <li>Taker fee (p)(cents): (cents): f_{\mathrm{taker}}(p) = 0.07 \times p \times (1-p/100)</li><li>Makerfee</li> <li>Maker fee (p):: f_{\mathrm{maker}}(p) = 0.0175 \times p \times (1-p/100)</li><li>Transactioncostforsize</li> <li>Transaction cost for size Qatprice at price p:</li></ul><p>:</li> </ul> <p>\mathrm{Cost} = f_{\mathrm{role}}(p) \times p \times Q</p><p>with</p> <p>with \mathrm{role} \in \{\text{maker}, \text{taker}\}</p><p>Rewardateachtimestepis:</p><p></p> <p>Reward at each timestep is:</p> <p>r_t = \sum_i (pos_i(t) - pos_i(t-\Delta t))\, m_i(t) - \mathrm{fees}_t</p><p>Atterminalsettlement,allopenpositionsaresettledatoutcome</p> <p>At terminal settlement, all open positions are settled at outcome o_i\in\{0,1\}with:</p><p> with:</p> <p>\mathrm{SettlementPL} = \sum_i pos_i(T)\, (o_i - m_i(T^-))</p><p>Totalepisodicrewardis</p> <p>Total episodic reward is \sum_t r_t + \mathrm{SettlementPL}$ and incorporates both market-to-market P&amp;L, transaction costs, and settlement corrections.</p> <h2 class='paper-heading' id='deterministic-replay-and-simulation-pipeline'>4. Deterministic Replay and Simulation Pipeline</h2> <p>The environment employs a deterministic, event-driven simulator to ensure reproducibility and fair comparison across trading agents. The canonical replay pipeline executes as follows:</p> <p>
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    
    for each episode_dir in episodes:
        meta = load(metadata.json)
        OB_events = stream(orderbook.parquet)
        Trade_events = stream(trades.parquet)
        Life_events = stream(settlement.json + lifecycle info)
        E = merge_and_sort([OB_events, Trade_events, Life_events], key=(timestamp, sequence_number))
        sim = Simulator(meta.fee_model, meta.execution_mode)
        current_time = meta.start_time
        event_ptr = 0
    
        while not sim.settlement_processed():
            next_decision = current_time + meta.cadence
            while E[event_ptr].timestamp <= next_decision:
                sim.process_event(E[event_ptr])
                event_ptr += 1
            obs = sim.get_observation(current_time=next_decision)
            actions = agent.step(obs)
            sim.apply_actions(actions)
            current_time = next_decision
    
        while event_ptr < len(E):
            sim.process_event(E[event_ptr])
            event_ptr += 1
        sim.close_all_positions()
    </p> <p>All episodic data is strictly partitioned by event identifier. Sequence numbers resolve any tie in event timestamps. This design enables precise event ordering, strict replay determinism, and supports both classical and tool-calling <a href="https://www.emergentmind.com/topics/llm-agents" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">LLM agents</a> with reproducible trajectories.</p> <h2 class='paper-heading' id='usage-api-access-and-key-statistics'>5. Usage, API Access, and Key Statistics</h2> <p>Researchers interact with Kalshi-based episodes programmatically via the PredictionMarketBench Python API. Core workflow:</p> <p>
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    
    from predictionmarketbench import BenchmarkHarness, AgentContext, Simulator
    
    episodes = BenchmarkHarness.list_episodes()  # ['KXBTCD-...', ...]
    epi = BenchmarkHarness.load_episode('KXBTCD-26JAN2017')  # loads all data
    sim = Simulator(fee_model=epi.metadata.fee_model, execution_mode=epi.metadata.execution_mode)
    ctx = AgentContext(sim)
    
    t = epi.metadata.start_time
    while not sim.settlement_processed():
        t_next = t + epi.metadata.agent_cadence
        sim.replay_until(t_next)
        obs = ctx.get_observation(time=t_next)  # dict: market/bbo/depth/pos/cash
        actions = my_agent.policy(obs)
        for a in actions:
            ctx.place_order(**a)
        t = t_next
    
    pnl, trade_log, equity_curve = sim.get_results()
    </p> <p>Observations are Python dicts mapping tickers to current quotes, depth arrays, positions, and cash. Actions are lists of dicts specifying ticker, side, order type, price, size, and TIF. The simulator outputs deterministic logs, timestamped fills, transactional fees, and detailed P&amp;L records for reproduction and offline analysis.</p> <p>Key statistics for each episode—duration, orderbook snapshots, trade volume, and ticker count—are summarized above. Decision steps per episode are proportional to duration and agent cadence (e.g., 37.4h at 5min → ≈448 steps). Aggregate volatility can be computed via $\sigma_{\mathrm{episode}} = \mathrm{std}(\Delta m_i(t))$ as an offline metric.

    6. Research Implications and Observed Dynamics

    The standardized Kalshi-based episodes offer a unique backtesting corpus with fee and settlement mechanisms characteristic of real prediction markets. Baseline analyses demonstrate that naive trading agents can underperform due to cumulative transaction costs and adverse settlement effects, while algorithmic, fee-aware agents display robustness in volatile regimes (Arora et al., 28 Jan 2026). This property highlights the critical influence of microstructure and execution modeling in algorithmic market design and validation. A plausible implication is that agents relying solely on directional signal without transaction cost modeling will systematically underperform relative to microstructure-sensitive strategies.

    The tool supports studies into agent adaptivity, liquidity provision, settlement risk management, and the development of testable, reproducible results across artificial and learned agent classes. The strict replay determinism and event-partitioned design of Kalshi-based episodes represent a methodological advance aligning with best practices in empirical market microstructure and reinforcement learning benchmark design.

    Definition Search Book Streamline Icon: https://streamlinehq.com
    References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Kalshi-based Episodes.