CARPO: Leveraging Listwise Learning-to-Rank for Context-Aware Query Plan Optimization

Published 3 Sep 2025 in cs.DB | (2509.03102v2)

Abstract: Efficient data processing is increasingly vital, with query optimizers playing a fundamental role in translating SQL queries into optimal execution plans. Traditional cost-based optimizers, however, often generate suboptimal plans due to flawed heuristics and inaccurate cost models, leading to the emergence of Learned Query Optimizers (LQOs). To address challenges in existing LQOs, such as the inconsistency and suboptimality inherent in pairwise ranking methods, we introduce CARPO, a generic framework leveraging listwise learning-to-rank for context-aware query plan optimization. CARPO distinctively employs a Transformer-based model for holistic evaluation of candidate plan sets and integrates a robust hybrid decision mechanism, featuring Out-Of-Distribution (OOD) detection with a top-k fallback strategy to ensure reliability. Furthermore, CARPO can be seamlessly integrated with existing plan embedding techniques, demonstrating strong adaptability. Comprehensive experiments on TPC-H and STATS benchmarks demonstrate that CARPO significantly outperforms both native PostgreSQL and Lero, achieving a Top-1 Rate of 74.54% on the TPC-H benchmark compared to Lero's 3.63%, and reducing the total execution time to 3719.16 ms compared to PostgreSQL's 22577.87 ms.

Abstract PDF Upgrade to Chat

Summary

The paper presents CARPO, a novel listwise learning-to-rank approach that improves query execution plan selection beyond traditional cost-based optimizers.
It utilizes a Transformer-based model with components such as a Plan Embedder and Hybrid Decision Maker for robust, context-aware ranking of candidate plans.
Experimental results demonstrate significant speed-ups on benchmarks like TPC-H, showcasing a Top-1 Rate increase to 74.54% and reduced execution times.

CARPO: Leveraging Listwise Learning-to-Rank for Context-Aware Query Plan Optimization

Introduction

The paper "CARPO: Leveraging Listwise Learning-to-Rank for Context-Aware Query Plan Optimization" introduces the CARPO framework, a novel approach to improving the query optimization in database management systems (DBMS). Traditional cost-based optimizers (CBOs) often fall short due to their reliance on heuristic-based cost models, which may not accurately predict execution costs, resulting in suboptimal query execution plans. CARPO addresses these limitations by employing a listwise learning-to-rank approach, leveraging machine learning techniques to enhance the selection of execution plans beyond what CBOs achieve.

Figure 1: Illustration of the generic framework of a learned query optimizer (LQO). Given a query, the LQO generates multiple candidate execution plans, ranks them based on predicted performance, and then executes the plan deemed optimal by the learned model within the query execution engine (e.g., MySQL, Oracle).

System Overview

CARPO utilizes a Transformer-based model to holistically evaluate candidate plans, integrating both the structural information of plans and context-dependent performance metrics. Its architecture is comprised of four main components: a Data Collector, a Plan Embedder, a Listwise Ranking Predictor, and a Hybrid Decision Maker.

Data Collector

The Data Collector generates candidate execution plans and collects execution metrics from actual query runs, which serve as ground truth data for training. The exploration strategy includes manipulating internal statistics to diversify candidate plan generation.

Plan Embedder

A flexible Plan Embedder, adaptable to various embedding techniques such as TreeCNN or TreeLSTM, processes each execution plan into a vector representation. This enables the downstream listwise ranking model to incorporate detailed structural and statistical information.

Listwise Ranking Predictor

CARPO's core Listwise Ranking Predictor employs a Transformer encoder for modelling the entire set of candidate plans simultaneously. This approach captures inter-plan context and dependencies, allowing for global ranking optimizations, unlike pairwise models that handle only local comparisons.

Figure 2: Illustration of CARPO's system overview. During training, CARPO learns to rank candidate plans generated for a query using a Transformer-based model, leveraging execution costs from the Query Execution Engine.

Hybrid Decision Maker

To manage uncertainties in the model predictions, especially for out-of-distribution queries, CARPO uses a Hybrid Decision Maker. This module assesses the confidence of model predictions against a threshold and defaults to the CBO's decision if predicted reliability is low. This safety mechanism combines top-k fallback with OOD detection for robust performance assurance.

Model Design

CARPO's model architecture is specifically optimized for listwise ranking tasks. It includes a Plan Embedding Module and employs a contextual ranking approach using self-attention mechanisms inherent to Transformers. The architecture directly optimizes ranking quality using a position-aware cross-entropy loss function, allowing accurate representation and ranking of execution plans based on learned data characteristics.

Figure 3: Illustration of CARPO's model architecture within the context-aware training framework. Candidate plans are embedded using a flexible Plan Embedder (TreeLSTM or TreeCNN in our case), and these embeddings are processed by a Transformer encoder.

Experimental Evaluation

CARPO was tested against traditional and learning-based optimizers, demonstrating significant performance enhancements on standard benchmarks:

Performance Metrics: CARPO reduced cumulative execution times on the TPC-H benchmark substantially, from PostgreSQL's 22,577.9ms to CARPO's 3,719.2ms.
Top-k Accuracy: On the TPC-H test set, CARPO achieved a Top-1 Rate of 74.54% compared to Lero's 3.63%, indicating superior ability to predict efficient execution plans.
Robustness and Adaptability: Evaluations showed CARPO's adaptability to various plan embedders, consistently improving performance metrics across diverse datasets.
Figure 4: Cumulative execution time on the TPC-H test set (left) and STATS test set (right), comparing CARPO against Lero, PostgreSQL (Pg), and the optimal performance (Best).

Conclusion

CARPO advances the field of learned query optimization by effectively integrating listwise learning-to-rank frameworks with robust fallback strategies. This approach addresses the inconsistencies and limitations of existing optimizers, providing a scalable and adaptable solution for modern data environments. Through enhanced performance and adaptability, CARPO sets a new standard in optimizing query execution plans in DBMS. As data and workload complexities increase, CARPO's robust framework offers a significant advantage, paving the way for further innovations in query optimization research and practice.

Markdown