Papers
Topics
Authors
Recent
Search
2000 character limit reached

CREW: Facilitating Human-AI Teaming Research

Published 31 Jul 2024 in cs.HC, cs.AI, and cs.LG | (2408.00170v3)

Abstract: With the increasing deployment of AI technologies, the potential of humans working with AI agents has been growing at a great speed. Human-AI teaming is an important paradigm for studying various aspects when humans and AI agents work together. The unique aspect of Human-AI teaming research is the need to jointly study humans and AI agents, demanding multidisciplinary research efforts from machine learning to human-computer interaction, robotics, cognitive science, neuroscience, psychology, social science, and complex systems. However, existing platforms for Human-AI teaming research are limited, often supporting oversimplified scenarios and a single task, or specifically focusing on either human-teaming research or multi-agent AI algorithms. We introduce CREW, a platform to facilitate Human-AI teaming research in real-time decision-making scenarios and engage collaborations from multiple scientific disciplines, with a strong emphasis on human involvement. It includes pre-built tasks for cognitive studies and Human-AI teaming with expandable potentials from our modular design. Following conventional cognitive neuroscience research, CREW also supports multimodal human physiological signal recording for behavior analysis. Moreover, CREW benchmarks real-time human-guided reinforcement learning agents using state-of-the-art algorithms and well-tuned baselines. With CREW, we were able to conduct 50 human subject studies within a week to verify the effectiveness of our benchmark.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Nakama. URL https://github.com/heroiclabs/nakama?tab=readme-ov-file.
  2. Unity netcode. URL https://unity.com/products/netcode.
  3. Weights & biases. URL https://wandb.ai/site.
  4. Dqn-tamer: Human-in-the-loop reinforcement learning with intractable feedback. arXiv preprint arXiv:1810.11748, 2018.
  5. Deep reinforcement learning from policy-dependent human feedback. arXiv preprint arXiv:1902.04257, 2019.
  6. Explainable artificial intelligence for autonomous driving: A comprehensive overview and field guide for future research directions. arXiv preprint arXiv:2112.11561, 2021.
  7. A. Bandini and J. Zariffa. Analysis of the hands in egocentric vision: A survey. IEEE transactions on pattern analysis and machine intelligence, 45(6):6846–6866, 2020.
  8. Deriving machine attention from human rationales. arXiv preprint arXiv:1808.09367, 2018.
  9. Semantic photo manipulation with a generative image prior. arXiv preprint arXiv:2005.07727, 2020.
  10. Spatial abilities for architecture: Cross sectional and longitudinal assessment with novel and existing spatial ability tests. Frontiers in psychology, 11:609363, 2021.
  11. C. J. Boes. The history of examination of reflexes. Journal of neurology, 261(12):2264–2274, 2014.
  12. Torchrl: A data-driven decision-making library for pytorch. arXiv preprint arXiv:2306.00577, 2023.
  13. On the utility of learning about humans for human-ai coordination. Advances in neural information processing systems, 32, 2019.
  14. Visual hide and seek. In Artificial Life Conference Proceedings 32, pages 645–655. MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info …, 2020.
  15. Visual perspective taking for opponent behavior modeling. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13678–13685. IEEE, 2021a.
  16. Visual behavior modelling for robotic theory of mind. Scientific Reports, 11(1):424, 2021b.
  17. Ai-employee collaboration and business performance: Integrating knowledge-based view, socio-technical systems and organisational socialisation framework. Journal of Business Research, 144:31–49, 2022.
  18. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
  19. Trust engineering for human-ai teams. In Proceedings of the human factors and ergonomics society annual meeting, volume 63, pages 322–326. SAGE Publications Sage CA: Los Angeles, CA, 2019.
  20. Children’s understanding of spatial relations: Coordination of perspectives. Developmental psychology, 7(1):21, 1972.
  21. Derail: Diagnostic environments for reward and imitation learning. arXiv preprint arXiv:2012.01365, 2020.
  22. B. Green and Y. Chen. The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW):1–24, 2019.
  23. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
  24. Human-ai collaboration: The effect of ai delegation on human task performance and task satisfaction. In Proceedings of the 28th International Conference on Intelligent User Interfaces, pages 453–463, 2023.
  25. Human–machine teaming is key to ai adoption: clinicians’ experiences with a deployed machine learning system. NPJ digital medicine, 5(1):97, 2022.
  26. Gan-based interactive reinforcement learning from demonstration and human evaluative feedback. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 4991–4998. IEEE, 2023.
  27. Reward learning from human preferences and demonstrations in atari. Advances in neural information processing systems, 31, 2018.
  28. Unity: A general platform for intelligent agents. arXiv preprint arXiv:1809.02627, 2018.
  29. R. Klimoski and S. Mohammed. Team mental model: Construct or metaphor? Journal of management, 20(2):403–437, 1994.
  30. W. B. Knox and P. Stone. Interactively shaping agents via human reinforcement: The tamer framework. In Proceedings of the fifth international conference on Knowledge capture, pages 9–16, 2009.
  31. The lab streaming layer for synchronized multimodal recording. bioRxiv, pages 2024–02, 2024.
  32. N. Kumar and A. Jain. A deep learning based model to assist blind people in their navigation. J. Inf. Technol. Educ. Innov. Pract., 21:95–114, 2022.
  33. FIND: human-in-the-loop debugging deep text classifiers. CoRR, abs/2010.04987, 2020. URL https://arxiv.org/abs/2010.04987.
  34. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  35. Convergent actor critic by humans. In International Conference on Intelligent Robots and Systems, 2016.
  36. Rlhf-blender: A configurable interactive interface for learning from diverse human feedback. arXiv preprint arXiv:2308.04332, 2023.
  37. Using perceptual and cognitive explanations for enhanced human-agent team performance. In Engineering Psychology and Cognitive Ergonomics: 15th International Conference, EPCE 2018, Held as Part of HCI International 2018, Las Vegas, NV, USA, July 15-20, 2018, Proceedings 15, pages 204–214. Springer, 2018.
  38. Human-robot teaming for search and rescue. IEEE Pervasive Computing, 4(1):72–79, 2005.
  39. H. S. Nwana. Intelligent tutoring systems: an overview. Artificial Intelligence Review, 4(4):251–277, 1990.
  40. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022.
  41. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  42. Ancilia: Scalable intelligent video surveillance for the artificial intelligence of things. IEEE Internet of Things Journal, 2023.
  43. Ethics in human–ai teaming: principles and perspectives. AI and Ethics, 3(3):917–935, 2023.
  44. The science of teamwork: Progress, reflections, and the road ahead. American Psychologist, 73(4):593, 2018.
  45. D. Scharre. Sage: A test to detect signs of alzheimer’s and dementia. The Ohio State University Wexner Medical Center, 2014.
  46. Environment guided interactive reinforcement learning: Learning from binary feedback in high-dimensional robot task environments. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pages 1726–1728, 2022.
  47. R. N. Shepard and J. Metzler. Mental rotation of three-dimensional objects. Science, 171(3972):701–703, 1971.
  48. R. C. Smith and P. Cheeseman. On the representation and estimation of spatial uncertainty. The international journal of Robotics Research, 5(4):56–68, 1986.
  49. Artificial intelligence for human flourishing–beyond principles for machine learning. Journal of Business Research, 124:374–388, 2021.
  50. Third eye: Exploring the affordances of third-person view in telepresence robots. In Social Robotics: 11th International Conference, ICSR 2019, Madrid, Spain, November 26–29, 2019, Proceedings 11, pages 707–716. Springer, 2019.
  51. Gymnasium, Mar. 2023. URL https://zenodo.org/record/8127025.
  52. USC Institute for Creative Technologies. Rapid integration & development environment (ride), 2024. URL https://ride.ict.usc.edu/. Accessed: 2024-05-27.
  53. Children’s representations of another person’s spatial perspective: Different strategies for different viewpoints? Journal of experimental child psychology, 153:57–73, 2017.
  54. Starcraft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782, 2017.
  55. Deep tamer: Interactive agent shaping in high-dimensional state spaces. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  56. Perfection not required? human-ai partnerships in code translation. In 26th International Conference on Intelligent User Interfaces, pages 402–412, 2021.
  57. Fresh: Interactive reward shaping in high-dimensional state spaces using human feedback. arXiv preprint arXiv:2001.06781, 2020.
  58. Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021.
  59. Local privacy protection classification based on human-centric computing. Human-centric computing and information sciences, 9(1):33, 2019.
  60. Uni-rlhf: Universal platform and benchmark suite for reinforcement learning with diverse human feedback. arXiv preprint arXiv:2402.02423, 2024.
  61. Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control, pages 321–384, 2021.
  62. J. Zhou and F. Chen. Towards trustworthy human-ai teaming under uncertainty. In IJCAI 2019 workshop on explainable AI (XAI), 2019.
  63. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
Citations (2)

Summary

  • The paper introduces the CREW platform that enables multidisciplinary, scalable human-AI teaming with integrated real-time communication and extensive data collection.
  • It details a modular design supporting roles differentiation, human feedback mechanisms, and reproducible experimental settings.
  • Benchmarking results reveal that individual cognitive differences significantly impact training outcomes, underscoring the need for personalized AI-human collaboration strategies.

Facilitating Human-AI Teaming Research with "CREW"

Introduction

The paper introduces "CREW: Facilitating Human-AI Teaming Research" (2408.00170), focusing on developing a comprehensive platform to enhance the study and application of Human-AI collaboration. As AI systems become increasingly integrated into daily life, the necessity for effective collaboration between humans and AI agents grows. Traditional AI research centers on isolated algorithm development, whereas Human-AI teaming requires a multidisciplinary approach, incorporating insights from fields such as cognitive science, neuroscience, and complex systems. The paper addresses limitations in existing platforms and proposes a solution with CREW, emphasizing the involvement of various scientific domains.

Platform Vision and Design

CREW is designed to support a wide range of tasks and facilitate multitasking environments, addressing critical challenges like real-time communication and extensive human data collection. The platform's functionalities include extensible environments, enabling researchers to develop and modify experimental settings with ease, and supporting real-time interaction between humans and AI agents. Figure 1

Figure 1: CREW supports multiple tasks from single-agent tasks to multi-agent competitive settings and offers various camera views for perceptual-motor research.

Figure 2

Figure 2: Environment generation in CREW with randomized mazes and procedurally generated terrains.

CREW enables hybrid Human-AI teaming modes, allowing for both collaborative and competitive settings across multiple environment instances. By supporting parallel sessions, CREW significantly improves scalability and efficiency in conducting large-scale experiments, particularly beneficial for processing data from multiple human subjects simultaneously.

Human and Agent Role Assignment

The platform differentiates between roles such as "Player," "Viewer," "Server," and "AI Agent." The "Player" role provides direct control over an agent, whereas "Viewer" allows users to observe and provide feedback. Human feedback mechanisms in CREW are particularly refined, supporting continuous and discrete scalar feedback, facilitating detailed guidance to AI agents. Figure 3

Figure 3: Simple connectivity setup in CREW, enabling participation across various tasks through IP selection.

Data Collection and Analysis

CREW's data collection capabilities cover both agent and human physiological data, with synchronized streaming through Lab Streaming Layer and integration with tools like Weights & Biases for detailed monitoring. The platform's modular design enables algorithm customization and deployment, crucial for real-time human feedback integration into RL algorithms. Figure 4

Figure 4: CREW's comprehensive data collection from game states to physiological signals, streamed through Lab Streaming Layer.

Figure 5

Figure 5: Analysis of human subjects with the highest cognitive test scores shows improved agent guiding performance, highlighting individual cognitive differences impact.

Benchmarking and Results

CREW was utilized to benchmark the c-Deep TAMER algorithm against RL baselines in diverse scenarios involving 50 human subjects, illustrating significant findings in human-AI training capabilities. The correlation between cognitive test scores and agent performance highlights the importance of individual human differences in training outcomes. Figure 6

Figure 6: c-Deep TAMER training examples showcase human feedback integration.

Figure 7

Figure 7: Linear regression plots detailing cognitive test scores' correlation with training performance.

Conclusion

CREW represents a significant advancement in Human-AI teaming research, providing an infrastructure for multidisciplinary collaboration. Future developments will focus on expanding environment diversity and supporting advanced physiological data analysis techniques. CREW's modular and open design sets a new standard for scalable, reproducible research in human-AI teaming.

CREW aims to foster interdisciplinary collaboration and set benchmarks for Human-AI interaction, looking forward to supporting diverse research avenues in AI enhancement and human cognitive understanding.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.