Papers
Topics
Authors
Recent
Search
2000 character limit reached

DataCI: A Platform for Data-Centric AI on Streaming Data

Published 27 Jun 2023 in cs.DC and cs.LG | (2306.15538v2)

Abstract: We introduce DataCI, a comprehensive open-source platform designed specifically for data-centric AI in dynamic streaming data settings. DataCI provides 1) an infrastructure with rich APIs for seamless streaming dataset management, data-centric pipeline development and evaluation on streaming scenarios, 2) an carefully designed versioning control function to track the pipeline lineage, and 3) an intuitive graphical interface for a better interactive user experience. Preliminary studies and demonstrations attest to the easy-to-use and effectiveness of DataCI, highlighting its potential to revolutionize the practice of data-centric AI in streaming data contexts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. Models in the loop: Aiding crowdworkers with generative annotation assistants. arXiv preprint arXiv:2112.09062, 2021.
  2. Towards a platform and benchmark suite for model training on dynamic datasets. In Proceedings of the 3rd Workshop on Machine Learning and Systems, pp.  8–17, 2023.
  3. Modelci-e: Enabling continual learning in deep learning serving systems. arXiv preprint arXiv:2106.03122, 2021.
  4. Active-learning-as-a-service: an efficient mlops system for data-centric ai. arXiv preprint arXiv:2207.09109, 2022.
  5. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  6. Modelps: An interactive and collaborative platform for editing pre-trained models at scale. arXiv preprint arXiv:2105.08275, 2021.
  7. A data-centric framework for composable NLP workflows. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp.  197–204, Online, October 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-demos.26. URL https://aclanthology.org/2020.emnlp-demos.26.
  8. Dataperf: Benchmarks for data-centric ai development. arXiv preprint arXiv:2207.10062, 2022.
  9. Ease.ml: A lifecycle management system for machine learning. In 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11-15, 2021, Online Proceedings. www.cidrdb.org, 2021. URL http://cidrdb.org/cidr2021/papers/cidr2021_paper26.pdf.
  10. Adversarial nibbler: A data-centric challenge for improving the safety of text-to-image models. arXiv preprint arXiv:2305.14384, 2023.
  11. Automatic differentiation in pytorch. 2017.
  12. Rethinking streaming machine learning evaluation. arXiv preprint arXiv:2205.11473, 2022.
  13. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp.  38–45, Online, October 2020. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.emnlp-demos.6.
  14. Dataclue: A benchmark suite for data-centric nlp. arXiv preprint arXiv:2111.08647, 2021.
  15. Data-centric artificial intelligence: A survey. arXiv preprint arXiv:2303.10158, 2023.
  16. Mlmodelci: An automatic cloud platform for efficient mlaas. In Proceedings of the 28th ACM International Conference on Multimedia, pp.  4453–4456, 2020.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.