Papers
Topics
Authors
Recent
Search
2000 character limit reached

Can Uncertainty Quantification Enable Better Learning-based Index Tuning?

Published 23 Oct 2024 in cs.DB and cs.LG | (2410.17748v1)

Abstract: Index tuning is crucial for optimizing database performance by selecting optimal indexes based on workload. The key to this process lies in an accurate and efficient benefit estimator. Traditional methods relying on what-if tools often suffer from inefficiency and inaccuracy. In contrast, learning-based models provide a promising alternative but face challenges such as instability, lack of interpretability, and complex management. To overcome these limitations, we adopt a novel approach: quantifying the uncertainty in learning-based models' results, thereby combining the strengths of both traditional and learning-based methods for reliable index tuning. We propose Beauty, the first uncertainty-aware framework that enhances learning-based models with uncertainty quantification and uses what-if tools as a complementary mechanism to improve reliability and reduce management complexity. Specifically, we introduce a novel method that combines AutoEncoder and Monte Carlo Dropout to jointly quantify uncertainty, tailored to the characteristics of benefit estimation tasks. In experiments involving sixteen models, our approach outperformed existing uncertainty quantification methods in the majority of cases. We also conducted index tuning tests on six datasets. By applying the Beauty framework, we eliminated worst-case scenarios and more than tripled the occurrence of best-case scenarios.

Authors (3)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. T. Siddiqui, W. Wu, V. Narasayya, and S. Chaudhuri, “Distill: Low-overhead data-driven techniques for filtering and costing indexes for scalable index tuning,” Proc. VLDB Endow., vol. 15, no. 10, pp. 2019–2031, 2022.
  2. S. Deep, A. Gruenheid, P. Koutris, J. Naughton, and S. Viglas, “Comprehensive and efficient workload compression,” Proc. VLDB Endow., vol. 14, no. 3, pp. 418–430, 2020.
  3. S. Chaudhuri and V. Narasayya, “Autoadmin “what-if” index analysis utility,” SIGMOD Rec., vol. 27, no. 2, p. 367–378, jun 1998.
  4. W. Wu, C. Wang, T. Siddiqui, J. Wang, V. R. Narasayya, S. Chaudhuri, and P. A. Bernstein, “Budget-aware index tuning with reinforcement learning,” in SIGMOD ’22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, Z. G. Ives, A. Bonifati, and A. E. Abbadi, Eds.   ACM, 2022, pp. 1528–1541. [Online]. Available: https://doi.org/10.1145/3514221.3526128
  5. Z. Wang, Q. Zeng, N. Wang, H. Lu, and Y. Zhang, “CEDA: learned cardinality estimation with domain adaptation,” Proc. VLDB Endow., vol. 16, no. 12, pp. 3934–3937, 2023. [Online]. Available: https://www.vldb.org/pvldb/vol16/p3934-wang.pdf
  6. B. Ding, S. Das, R. Marcus, W. Wu, S. Chaudhuri, and V. R. Narasayya, “AI meets AI: leveraging query executions to improve index recommendations,” in Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, P. A. Boncz, S. Manegold, A. Ailamaki, A. Deshpande, and T. Kraska, Eds.   ACM, 2019, pp. 1241–1258. [Online]. Available: https://doi.org/10.1145/3299869.3324957
  7. J. Shi, G. Cong, and X. Li, “Learned index benefits: Machine learning based index performance estimation,” Proc. VLDB Endow., vol. 15, no. 13, pp. 3950–3962, 2022. [Online]. Available: https://www.vldb.org/pvldb/vol15/p3950-shi.pdf
  8. T. Nair, D. Precup, D. L. Arnold, and T. Arbel, “Exploring uncertainty measures in deep networks for multiple sclerosis lesion detection and segmentation,” Medical Image Anal., vol. 59, 2020. [Online]. Available: https://doi.org/10.1016/j.media.2019.101557
  9. D. Feng, L. Rosenbaum, and K. Dietmayer, “Towards safe autonomous driving: Capture uncertainty in the deep neural network for lidar 3d vehicle detection,” in 21st International Conference on Intelligent Transportation Systems, ITSC 2018, Maui, HI, USA, November 4-7, 2018, W. Zhang, A. M. Bayen, J. J. S. Medina, and M. J. Barth, Eds.   IEEE, 2018, pp. 3266–3273. [Online]. Available: https://doi.org/10.1109/ITSC.2018.8569814
  10. Y. Zhang and A. A. Lee, “Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning,” CoRR, vol. abs/1902.00925, 2019. [Online]. Available: http://arxiv.org/abs/1902.00925
  11. H. Lan, Z. Bao, and Y. Peng, “An index advisor using deep reinforcement learning,” in CIKM ’20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19-23, 2020, M. d’Aquin, S. Dietze, C. Hauff, E. Curry, and P. Cudré-Mauroux, Eds.   ACM, 2020, pp. 2105–2108. [Online]. Available: https://doi.org/10.1145/3340531.3412106
  12. S. Chaudhuri and V. R. Narasayya, “An efficient cost-driven index selection tool for microsoft SQL server,” in VLDB’97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece, M. Jarke, M. J. Carey, K. R. Dittrich, F. H. Lochovsky, P. Loucopoulos, and M. A. Jeusfeld, Eds.   Morgan Kaufmann, 1997, pp. 146–155.
  13. K.-Y. Whang, “Index selection in relational databases,” in Foundations of Data Organization.   Springer, 1987, pp. 487–500.
  14. S. Chaudhuri and V. Narasayya, “Anytime algorithm of database tuning advisor for microsoft sql server,” https://www.microsoft.com/en-us/research/publication/anytime-algorithm-of-database-tuning-advisor-for-microsoft-sql-server/, June 2020, visited 2024-10-16.
  15. D. Dash, N. Polyzotis, and A. Ailamaki, “Cophy: A scalable, portable, and interactive index advisor for large workloads,” Proc. VLDB Endow., vol. 4, no. 6, pp. 362–372, 2011. [Online]. Available: http://www.vldb.org/pvldb/vol4/p362-dash.pdf
  16. R. M. Perera, B. Oetomo, B. I. P. Rubinstein, and R. Borovica-Gajic, “No dba? no regret! multi-armed bandits for index tuning of analytical and htap workloads with provable guarantees,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 12, pp. 12 855–12 872, 2023.
  17. R. M. Perera, B. Oetomo, B. I. P. Rubinstein, and R. Borovica-Gajic, “HMAB: self-driving hierarchy of bandits for integrated physical database design tuning,” Proc. VLDB Endow., vol. 16, no. 2, pp. 216–229, 2022. [Online]. Available: https://www.vldb.org/pvldb/vol16/p216-perera.pdf
  18. J. Kossmann, A. Kastius, and R. Schlosser, “SWIRL: selection of workload-aware indexes using reinforcement learning,” in Proceedings of the 25th International Conference on Extending Database Technology, EDBT 2022, Edinburgh, UK, March 29 - April 1, 2022, J. Stoyanovich, J. Teubner, P. Guagliardo, M. Nikolic, A. Pieris, J. Mühlig, F. Özcan, S. Schelter, H. V. Jagadish, and M. Zhang, Eds.   OpenProceedings.org, 2022, pp. 2:155–2:168. [Online]. Available: https://doi.org/10.48786/edbt.2022.06
  19. V. Sharma and C. Dyreson, “Indexer++: Workload-aware online index tuning with transformers and reinforcement learning,” in Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, ser. SAC ’22.   Association for Computing Machinery, 2022, pp. 372–380.
  20. R. Schlosser, J. Kossmann, and M. Boissier, “Efficient scalable multi-attribute index selection using recursive strategies,” in 2019 IEEE 35th International Conference on Data Engineering (ICDE).   IEEE, 2019, pp. 1238–1249.
  21. N. Bruno and S. Chaudhuri, “Automatic physical database tuning: A relaxation-based approach,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14-16, 2005, F. Özcan, Ed.   ACM, 2005, pp. 227–238. [Online]. Available: https://doi.org/10.1145/1066157.1066184
  22. J. Gawlikowski, C. R. N. Tassi, M. Ali, J. Lee, M. Humt, J. Feng, A. M. Kruspe, R. Triebel, P. Jung, R. Roscher, M. Shahzad, W. Yang, R. Bamler, and X. Zhu, “A survey of uncertainty in deep neural networks,” Artif. Intell. Rev., vol. 56, no. S1, pp. 1513–1589, 2023. [Online]. Available: https://doi.org/10.1007/s10462-023-10562-9
  23. E. Hüllermeier and W. Waegeman, “Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods,” Mach. Learn., vol. 110, no. 3, pp. 457–506, 2021. [Online]. Available: https://doi.org/10.1007/s10994-021-05946-3
  24. M. Abdar, F. Pourpanah, S. Hussain, D. Rezazadegan, L. Liu, M. Ghavamzadeh, P. W. Fieguth, X. Cao, A. Khosravi, U. R. Acharya, V. Makarenkov, and S. Nahavandi, “A review of uncertainty quantification in deep learning: Techniques, applications and challenges,” Inf. Fusion, vol. 76, pp. 243–297, 2021. [Online]. Available: https://doi.org/10.1016/j.inffus.2021.05.008
  25. J. S. Denker, D. B. Schwartz, B. S. Wittner, S. A. Solla, R. E. Howard, L. D. Jackel, and J. J. Hopfield, “Large automatic learning, rule extraction, and generalization,” Complex Syst., vol. 1, no. 5, 1987. [Online]. Available: http://www.complex-systems.com/abstracts/v01_i05_a02.html
  26. M. Opper and C. Archambeau, “The variational gaussian approximation revisited,” Neural Comput., vol. 21, no. 3, pp. 786–792, 2009. [Online]. Available: https://doi.org/10.1162/neco.2008.08-07-592
  27. Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, ser. JMLR Workshop and Conference Proceedings, M. Balcan and K. Q. Weinberger, Eds., vol. 48.   JMLR.org, 2016, pp. 1050–1059. [Online]. Available: http://proceedings.mlr.press/v48/gal16.html
  28. L. K. Hansen and P. Salamon, “Neural network ensembles,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 12, no. 10, pp. 993–1001, 1990. [Online]. Available: https://doi.org/10.1109/34.58871
  29. G. D. C. Cavalcanti, L. S. Oliveira, T. J. M. Moura, and G. V. Carvalho, “Combining diversity measures for ensemble pruning,” Pattern Recognit. Lett., vol. 74, pp. 38–45, 2016. [Online]. Available: https://doi.org/10.1016/j.patrec.2016.01.029
  30. J. Pei, C. Wang, and G. Szarvas, “Transformer uncertainty estimation with hierarchical stochastic attention,” in Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022.   AAAI Press, 2022, pp. 11 147–11 155. [Online]. Available: https://doi.org/10.1609/aaai.v36i10.21364
  31. C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural networks,” CoRR, vol. abs/1505.05424, 2015. [Online]. Available: http://arxiv.org/abs/1505.05424
  32. B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, Eds., 2017, pp. 6402–6413. [Online]. Available: https://proceedings.neurips.cc/paper/2017/hash/9ef2ed4b7fd2c810847ffa5fa85bce38-Abstract.html
  33. F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Sequential model-based optimization for general algorithm configuration,” in Learning and Intelligent Optimization - 5th International Conference, LION 5, Rome, Italy, January 17-21, 2011. Selected Papers, ser. Lecture Notes in Computer Science, C. A. C. Coello, Ed., vol. 6683.   Springer, 2011, pp. 507–523. [Online]. Available: https://doi.org/10.1007/978-3-642-25566-3_40
  34. M. Lindauer, K. Eggensperger, M. Feurer, A. Biedenkapp, D. Deng, C. Benjamins, T. Ruhkopf, R. Sass, and F. Hutter, “SMAC3: A versatile bayesian optimization package for hyperparameter optimization,” J. Mach. Learn. Res., vol. 23, pp. 54:1–54:9, 2022. [Online]. Available: http://jmlr.org/papers/v23/21-0888.html
  35. G. Dong, G. Liao, H. Liu, and G. Kuang, “A review of the autoencoder and its variants: A comparative perspective from target recognition in synthetic-aperture radar images,” IEEE Geoscience and Remote Sensing Magazine, vol. 6, no. 3, pp. 44–68, 2018.
  36. A. Graves, “Practical variational inference for neural networks,” in Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. C. N. Pereira, and K. Q. Weinberger, Eds., 2011, pp. 2348–2356. [Online]. Available: https://proceedings.neurips.cc/paper/2011/hash/7eb3c8be3d411e8ebfab08eba5f49632-Abstract.html
  37. K. Fedyanin, E. Tsymbalov, and M. Panov, “Dropout strikes back: Improved uncertainty estimation via diversity sampling,” in International Conference on Analysis of Images, Social Networks and Texts.   Springer, 2021, pp. 125–137.
  38. J. Kossmann, S. Halfpap, M. Jankrift, and R. Schlosser, “Magic mirror in my hand, which is the best in the land?: An experimental evaluation of index selection algorithms,” Proc. VLDB Endow., vol. 13, no. 12, pp. 2382–2395, 2020.
  39. T. Yu, Z. Zou, W. Sun, and Y. Yan, “Refactoring index tuning process with benefit estimation,” Proc. VLDB Endow., vol. 17, no. 7, pp. 1528–1541, 2024. [Online]. Available: https://www.vldb.org/pvldb/vol17/p1528-zou.pdf
  40. A. Shelmanov, E. Tsymbalov, D. Puzyrev, K. Fedyanin, A. Panchenko, and M. Panov, “How certain is your transformer?” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, Online, April 19 - 23, 2021, P. Merlo, J. Tiedemann, and R. Tsarfaty, Eds.   Association for Computational Linguistics, 2021, pp. 1833–1840. [Online]. Available: https://doi.org/10.18653/v1/2021.eacl-main.157
  41. Z. Sadri, L. Gruenwald, and E. Leal, “Drlindex: deep reinforcement learning index advisor for a cluster database,” in IDEAS 2020: 24th International Database Engineering & Applications Symposium, Seoul, Republic of Korea, August 12-14, 2020, B. C. Desai and W. Cho, Eds.   ACM, 2020, pp. 11:1–11:8. [Online]. Available: https://doi.org/10.1145/3410566.3410603
  42. G. P. Licks, J. M. C. Couto, P. de Fátima Miehe, R. D. Paris, D. D. A. Ruiz, and F. Meneguzzi, “Smartix: A database indexing agent based on reinforcement learning,” Appl. Intell., vol. 50, no. 8, pp. 2575–2588, 2020. [Online]. Available: https://doi.org/10.1007/s10489-020-01674-8
  43. T. Siddiqui, S. Jo, W. Wu, C. Wang, V. Narasayya, and S. Chaudhuri, “Isum: Efficiently compressing large and complex workloads for scalable index tuning,” in Proceedings of the 2022 International Conference on Management of Data, ser. SIGMOD ’22.   Association for Computing Machinery, 2022, pp. 660–673.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.