Papers
Topics
Authors
Recent
Search
2000 character limit reached

Read the Room: Adapting a Robot's Voice to Ambient and Social Contexts

Published 10 May 2022 in cs.RO and cs.AI | (2205.04952v3)

Abstract: How should a robot speak in a formal, quiet and dark, or a bright, lively and noisy environment? By designing robots to speak in a more social and ambient-appropriate manner we can improve perceived awareness and intelligence for these agents. We describe a process and results toward selecting robot voice styles for perceived social appropriateness and ambiance awareness. Understanding how humans adapt their voices in different acoustic settings can be challenging due to difficulties in voice capture in the wild. Our approach includes 3 steps: (a) Collecting and validating voice data interactions in virtual Zoom ambiances, (b) Exploration and clustering human vocal utterances to identify primary voice styles, and (c) Testing robot voice styles in recreated ambiances using projections, lighting and sound. We focus on food service scenarios as a proof-of-concept setting. We provide results using the Pepper robot's voice with different styles, towards robots that speak in a contextually appropriate and adaptive manner. Our results with N=120 participants provide evidence that the choice of voice style in different ambiances impacted a robot's perceived intelligence in several factors including: social appropriateness, comfort, awareness, human-likeness and competency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. H. A. C. Maruri, S. Aslan, G. Stemmer, N. Alyuz, and L. Nachman, “Analysis of contextual voice changes in remote meetings,” in Interspeech, 2021, pp. 2521–2525.
  2. I. Torre, A. B. Latupeirissa, and C. McGinn, “How context shapes the appropriateness of a robot’s voice,” in ROMAN, 2020, pp. 215–222.
  3. A. Henschel, G. Laban, and E. Cross, “What makes a robot social? a review of social robots from science fiction to a home or hospital near you,” Curr. Robot. Rep., vol. 2, 2021.
  4. A. R. Bradlow, “Confluent talker-and listener-oriented forces in clear speech production,” Laboratory phonology, vol. 7, pp. 241–273, 2002.
  5. D. Pelegrin-Garcia, B. Smits, J. Brunskog, and C.-H. Jeong, “Vocal effort with changing talker-to-listener distance in different acoustic environments.” J. Acoust. Soc, vol. 129 4, pp. 1981–90, 2011.
  6. V. Hazan and R. Baker, “Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening conditions,” J. Acoust. Soc, vol. 130, pp. 2139–52, 2011.
  7. E. Lombard, “Le signe de l’élévation de la voix,” Ana. d. Mal. de L’Oreillexdu du larynx [etc], vol. 37, pp. 101–119, 1911.
  8. D. Burnham, C. Kitamura, and U. Vollmer-Conna, “What’s new, pussycat? on talking to babies and animals,” Science, vol. 296, p. 1435, 2002.
  9. C. Lam and C. Kitamura, “Mommy, speak clearly: induced hearing loss shapes vowel hyperarticulation.” Dev. Sci., vol. 15, no. 2, pp. 212–21, 2012.
  10. C. Mayo, V. Aubanel, and M. Cooke, “Effect of prosodic changes on speech intelligibility,” in Interspeech, vol. 2, 2012.
  11. J. A. Caballero, N. Vergis, X. Jiang, and M. D. Pell, “The sound of im/politeness,” Speech Commun., vol. 102, pp. 39–53, 2018.
  12. A. Matsufuji and A. Lim, “Perceptual effects of ambient sound on an artificial agent’s rate of speech,” in Companion of HRI, 2021, pp. 67–70.
  13. Y. Okuno, T. Kanda, M. Imai, H. Ishiguro, and N. Hagita, “Providing route directions: Design of robot’s utterance, gesture, and timing,” in HRI, 2009, pp. 53–60.
  14. A. Hönemann and P. Wagner, “Adaptive speech synthesis in a cognitive robotic service apartment: An overview and first steps towards voice selection,” in ESSV, 2015.
  15. S. J. Sutton, P. Foulkes, D. Kirk, and S. Lawson, “Voice as a design material: Sociophonetic inspired design strategies in human-computer interaction,” in CHI, 2019, p. 1–14.
  16. N. Lubold, E. Walker, and H. Pon-Barry, “Effects of voice-adaptation and social dialogue on perceptions of a robotic learning companion,” in HRI, 2016, pp. 255–262.
  17. K. Fischer, L. Naik, R. M. Langedijk, T. Baumann, M. Jelínek, and O. Palinko, “Initiating human-robot interactions using incremental speech adaptation,” in Companion of HRI, 2021.
  18. A. Hayamizu, M. Imai, K. Nakamura, and K. Nakadai, “Volume adaptation and visualization by modeling the volume level in noisy environments for telepresence system,” in HAI.   ACM, 2014.
  19. P. Arias, P. Belin, and J.-J. Aucouturier, “Auditory smiles trigger unconscious facial imitation,” Current Biology, vol. 28, no. 14, pp. R782–R783, 2018.
  20. A. Castellanos, J.-M. Benedí, and F. Casacuberta, “An analysis of general acoustic-phonetic features for spanish speech produced with the lombard effect,” Speech Commun., vol. 20, no. 1, pp. 23–35, 1996.
  21. H. F. Wertzner, S. Schreiber, and L. Amaro, “Analysis of fundamental frequency, jitter, shimmer and vocal intensity in children with phonological disorders,” Braz j. of otorh, vol. 71, no. 5, pp. 582–588, 2005.
  22. T. IR and P. A, “Vocal loudness variation with spectral slope,” J. Speech Lang. Hear, vol. 63, no. 1, pp. 74–82, 2020.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.