Papers
Topics
Authors
Recent
Search
2000 character limit reached

NeurDB: An AI-powered Autonomous Data System

Published 7 May 2024 in cs.DB, cs.AI, and cs.LG | (2405.03924v2)

Abstract: In the wake of rapid advancements in AI, we stand on the brink of a transformative leap in data systems. The imminent fusion of AI and DB (AIxDB) promises a new generation of data systems, which will relieve the burden on end-users across all industry sectors by featuring AI-enhanced functionalities, such as personalized and automated in-database AI-powered analytics, self-driving capabilities for improved system performance, etc. In this paper, we explore the evolution of data systems with a focus on deepening the fusion of AI and DB. We present NeurDB, an AI-powered autonomous data system designed to fully embrace AI design in each major system component and provide in-database AI-powered analytics. We outline the conceptual and architectural overview of NeurDB, discuss its design choices and key components, and report its current development and future plan.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Database meets deep learning: Challenges and opportunities. SIGMOD Rec., 45(2):17–22, 2016.
  2. What’s really new with newsql? SIGMOD Rec., 45(2):45–55, 2016.
  3. Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1993.
  4. Michael L. Brodie. Future intelligent information systems: AI and database technologies working together. In AAAI, pages 844–845. AAAI Press / The MIT Press, 1988.
  5. Input selection for fast feature engineering. In ICDE, pages 577–588. IEEE Computer Society, 2016.
  6. Model slicing for supporting complex analytics with elastic inference cost and resource constraints. Proc. VLDB Endow., 13(2):86–99, 2019.
  7. Kinetica: naturalistic multi-touch data visualization. In CHI, pages 897–906. ACM, 2014.
  8. MB2: decomposed behavior modeling for self-driving database management systems. In SIGMOD Conference, pages 1248–1261. ACM, 2021.
  9. Big healthcare data analytics: Challenges and applications. In Handbook of Large-Scale Distributed Computing in Smart Healthcare, pages 11–41. Springer, 2017.
  10. MINT: detecting fraudulent behaviors from time-series relational data. Proc. VLDB Endow., 16(12):3610–3623, 2023.
  11. Apache SINGA. https://singa.apache.org/, 2024.
  12. SINGA-Easy: An easy-to-use framework for multimodal analysis. In ACM Multimedia, pages 1293–1302. ACM, 2021.
  13. Robust and transferable log-based anomaly detection. Proc. ACM Manag. Data, 1(1):64:1–64:26, 2023.
  14. Data Cube: A relational aggregation operator generalizing group-by, cross-tab, and sub totals. Data Min. Knowl. Discov., 1(1):29–53, 1997.
  15. Rafiki: machine learning as an analytics service system. Proc. VLDB Endow., 12(2):128–140, 2018.
  16. Clipper: A low-latency online prediction serving system. In NSDI, pages 613–627. USENIX Association, 2017.
  17. Deep residual learning for image recognition. In CVPR, pages 770–778. IEEE Computer Society, 2016.
  18. Learning transferable architectures for scalable image recognition. In CVPR, pages 8697–8710. Computer Vision Foundation / IEEE Computer Society, 2018.
  19. Incentive-aware decentralized data collaboration. Proc. ACM Manag. Data, 1(2):158:1–158:27, 2023.
  20. Falcon: A privacy-preserving and interpretable vertical federated learning system. Proc. VLDB Endow., 16(10):2471–2484, 2023.
  21. Privacy preserving vertical federated learning for tree-based models. Proc. VLDB Endow., 13(11):2090–2103, 2020.
  22. ForkBase: An efficient storage engine for blockchain and forkable applications. Proc. VLDB Endow., 11(10):1137–1150, 2018.
  23. TRACER: A framework for facilitating accurate and interpretable analytics for high stakes applications. In SIGMOD Conference, pages 1747–1763. ACM, 2020.
  24. ELDA: learning explicit dual-interactions for healthcare analytics. In ICDE, pages 393–406. IEEE, 2022.
  25. PACE: learning effective task decomposition for human-in-the-loop healthcare delivery. In SIGMOD Conference, pages 2156–2168. ACM, 2021.
  26. MLCask: Efficient management of component evolution in collaborative data analytics pipelines. In ICDE, pages 1655–1666. IEEE, 2021.
  27. Enabling secure and efficient data analytics pipeline evolution with trusted execution environment. Proc. VLDB Endow., 16(10):2485–2498, 2023.
  28. The shift from models to compound ai systems. https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/, 2024.
  29. Large language models for business process management: Opportunities and challenges. In BPM (Forum), volume 490 of Lecture Notes in Business Information Processing, pages 107–123. Springer, 2023.
  30. IP8Value. https://webapp.ip8value.com/, 2024.
  31. Cerebro: A data system for optimized deep learning model selection. Proc. VLDB Endow., 13(11):2159–2173, 2020.
  32. Improving keyword spotting and language identification via neural architecture search at scale. In INTERSPEECH, pages 1278–1282. ISCA, 2019.
  33. Zero-cost proxies for lightweight NAS. In ICLR. OpenReview.net, 2021.
  34. Unifying and boosting gradient-based training-free neural architecture search. In NeurIPS, 2022.
  35. How powerful are performance predictors in neural architecture search? In NeurIPS, pages 28454–28469, 2021.
  36. MArk: Exploiting cloud services for cost-effective, slo-aware machine learning inference serving. In USENIX Annual Technical Conference, pages 1049–1062. USENIX Association, 2019.
  37. PIQL: success-tolerant query processing in the cloud. Proc. VLDB Endow., 5(3):181–192, 2011.
  38. Anytime neural architecture search on tabular data. CoRR, abs/2403.10318, 2024.
  39. Pruning neural networks without any data by iteratively conserving synaptic flow. In NeurIPS, 2020.
  40. Neural architecture search without training. In ICML, volume 139 of Proceedings of Machine Learning Research, pages 7588–7598. PMLR, 2021.
  41. Database native model selection: Harnessing deep neural networks in database systems. Proc. VLDB Endow., 17(5):1020–1033, 2024.
  42. Hierarchical representations for efficient architecture search. In ICLR (Poster). OpenReview.net, 2018.
  43. Neural factorization machines for sparse predictive analytics. In SIGIR, pages 355–364. ACM, 2017.
  44. Learning models over relational data using sparse tensors and functional dependencies. ACM Trans. Database Syst., 45(2):7:1–7:66, 2020.
  45. Powering in-database dynamic model slicing for structured data analytics. CoRR, abs/2405.00568, 2024.
  46. DyHealth: Making neural networks dynamic for effective healthcare analytics. Proc. VLDB Endow., 15(12):3445–3458, 2022.
  47. Effective multi-modal retrieval based on stacked auto-encoders. Proc. VLDB Endow., 7(8):649–660, 2014.
  48. SINGA: putting deep learning in the hands of multimedia users. In ACM Multimedia, pages 25–34. ACM, 2015.
  49. Index selection in a self-adaptive data base management system. In SIGMOD Conference, pages 1–8. ACM, 1976.
  50. The EXODUS optimizer generator. In SIGMOD Conference, pages 160–172. ACM Press, 1987.
  51. PQR: predicting query execution times for autonomous workload management. In ICAC, pages 13–22. IEEE Computer Society, 2008.
  52. VeriTxn: Verifiable transactions for cloud-native databases with storage disaggregation. Proc. ACM Manag. Data, 1(4):270:1–270:27, 2023.
  53. The indispensability of dispensable indexes. IEEE Trans. Knowl. Data Eng., 11(1):17–27, 1999.
  54. iDistance: an adaptive B++{}^{\mbox{+}}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst., 30(2):364–397, 2005.
  55. Efficient cost models for spatial queries using r-trees. IEEE Trans. Knowl. Data Eng., 12(1):19–32, 2000.
  56. Database System Concepts, Seventh Edition. McGraw-Hill Book Company, 2020.
  57. Fast filter-and-refine algorithms for subsequence selection. In IDEAS, pages 243–255. IEEE Computer Society, 2002.
  58. Lero: A learning-to-rank query optimizer. Proc. VLDB Endow., 16(6):1466–1479, 2023.
  59. Bao: Making learned query optimization practical. In SIGMOD Conference, pages 1275–1288. ACM, 2021.
  60. Balsa: Learning a query optimizer without expert demonstrations. In SIGMOD Conference, pages 931–944. ACM, 2022.
  61. Kepler: Robust learning for parametric query optimization. Proc. ACM Manag. Data, 1(1):109:1–109:25, 2023.
  62. Neo: A learned query optimizer. Proc. VLDB Endow., 12(11):1705–1718, 2019.
  63. Hyper-decision transformer for efficient online policy adaptation. In ICLR. OpenReview.net, 2023.
  64. Decision Transformer: Reinforcement learning via sequence modeling. In NeurIPS, pages 15084–15097, 2021.
  65. Polyjuice: High-performance transactions via learned concurrency control. In OSDI, pages 198–216. USENIX Association, 2021.
  66. Bringing modular concurrency control to the next level. In SIGMOD Conference, pages 283–297. ACM, 2017.
  67. Toward coordination-free and reconfigurable mixed concurrency control. In USENIX Annual Technical Conference, pages 809–822. USENIX Association, 2018.
  68. SINGA: A distributed deep learning platform. In ACM Multimedia, pages 685–688. ACM, 2015.
  69. Secure and verifiable data collaboration with low-cost zero-knowledge proofs. Proc. VLDB Endow., 2024.
  70. Communication efficient and differentially private logistic regression under the distributed setting. In KDD, pages 69–79. ACM, 2023.
  71. Analyzing subgraph statistics from extended local views with decentralized differential privacy. In CCS, pages 703–717. ACM, 2019.
  72. GlassDB: An efficient verifiable ledger database system through transparency. Proc. VLDB Endow., 16(6):1359–1371, 2023.
  73. Concerto: A high concurrency key-value store with integrity. In SIGMOD Conference, pages 251–266. ACM, 2017.
  74. NeurDB. https://neurdb.com, 2024.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.