Summary: People tend to think voice-user AI interfaces such as Siri or Alexa are more competent and emotionally engaging if they exhibit social cues.
A family gathers around their kitchen island to unbox the digital assistant they just purchased. They will be more likely to trust this new voice-user interface, which might be a smart speaker like Amazon’s Alexa or a social robot like Jibo, if it exhibits some humanlike social behaviors, according to a new study by researchers in MIT’s Media Lab.
The researchers found that family members tend to think a device is more competent and emotionally engaging if it can exhibit social cues, like moving to orient its gaze at a speaking person. In addition, their study revealed that branding — specifically, whether the manufacturer’s name is associated with the device — has a significant effect on how members of a family perceive and interact with different voice-user interfaces.
When a device has a higher level of social embodiment, such as the ability to give verbal and nonverbal social cues through motion or expression, family members also interacted with one another more frequently while engaging with the device as a group, the researchers found.
Their results could help designers create voice-user interfaces that are more engaging and more likely to be used by members of a family in the home, while also improving the transparency of these devices. The researchers also outline ethical concerns that could come from certain personality and embodiment designs.
“These devices are new technology coming into the home and they are still very under-explored,” says Anastasia Ostrowski, a research assistant in the Personal Robotics Group in the Media Lab, and lead author of the paper. “Families are in the home, so we were very interested in looking at this from a generational approach, including children and grandparents. It was super interesting for us to understand how people are perceiving these, and how families interact with these devices together.”
Coauthors include Vasiliki Zygouras, a recent Wellesley College graduate working in the Personal Robotics Group at the time of this research; Research Scientist Hae Won Park; Cornell University graduate student Jenny Fu; and senior author Cynthia Breazeal, professor of media arts and sciences, director of MIT RAISE, and director of the Personal Robotics Group, as well as a developer of the Jibo robot.
The paper is published in Frontiers in Robotics and AI.
This work grew out of an earlier study where the researchers explored how people use voice-user interfaces at home. At the start of the study, users familiarized themselves with three devices before taking one home for a month. The researchers noticed that people spent more time interacting with a Jibo social robot than they did the smart speakers, Amazon Alexa and Google Home. They wondered whypeople engaged more with the social robot.
To get to the bottom of this, they designed three experiments that involved family members interacting as a group with different voice-user interfaces. Thirty-four families, comprising 92 people between age 4 and 69, participated in the studies.
The experiments were designed to mimic a family’s first encounter with a voice-user interface. Families were video recorded as they interacted with three devices, working through a list of 24 actions (like “ask about the weather” or “try to learn the agent’s opinions”). Then they answered questions about their perception of the devices and categorized the voice-user interfaces’ personalities.
In the first experiment, participants interacted with a Jibo robot, Amazon Echo, and Google Home, with no modifications. Most found the Jibo to be far more outgoing, dependable, and sympathetic. Because the users perceived that Jibo had a more humanlike personality, they were more likely to interact with it, Ostrowski explains.
An unexpected result
In the second experiment, researchers set out to understand how branding affected participants’ perspectives. They changed the “wake word” (the word the user says aloud to engage the device) of the Amazon Echo to “Hey, Amazon!” instead of “Hey, Alexa!,” but kept the “wake word” the same for the Google Home (“Hey, Google!”) and the Jibo robot (“Hey, Jibo!”). They also provided participants with information about each manufacturer. When branding was taken into account, users viewed Google as more trustworthy than Amazon, despite the fact that the devices were very similar in design and functionality.
“It also drastically changed how much people thought the Amazon device was competent or like a companion,” Ostrowski says. “I was not expecting it to have that big of a difference between the first and second study. We didn’t change any of the abilities, how they function, or how they respond. Just the fact that they were aware the device is made by Amazon made a huge difference in their perceptions.”
Changing the “wake word” of a device can have ethical implications. A personified name, which can make a device seem more social, could mislead users by masking the connection between the device and the company that made it, which is also the company that now has access to the user’s data, she says.
In the third experiment, the team wanted to see how interpersonal movement affected the interactions. For instance, the Jibo robot turns its gaze to the individual who is speaking. For this study, the researchers used the Jibo along with an Amazon Echo Show (a rectangular screen) with the modified wake word “Hey, Computer,” and an Amazon Echo Spot (a sphere with a circular screen) that had a rotating flag on top which sped up when someone called its wake word, “Hey, Alexa!”
Users found the modified Amazon Echo Spot to be no more engaging than the Amazon Echo Show, suggesting that repetitive movement without social embodiment may not be an effective way to increase user engagement, Ostrowski says.
Fostering deeper relationships
Deeper analysis of the third study also revealed that users interacted more among themselves, like glancing at each other, laughing together, or having side conversations, when the device they were engaging with had more social abilities.
“In the home, we have been wondering how these systems promote engagement between users. That is always a big concern for people: How are these devices going to shape people’s relationships? We want to design systems that can promote a more flourishing relationship between people,” Ostrowski says.
The researchers used their insights to lay out several voice-user interface design considerations, including the importance of developing warm, outgoing, and thoughtful personalities; understanding how the wake word influences user acceptance; and conveying nonverbal social cues through movement.
With these results in hand, the researchers want to continue exploring how families engage with voice-user interfaces that have varying levels of functionality. For instance, they might conduct a study with three different social robots. They would also like to replicate these studies in a real-world environment and explore which design features are best suited for specific interactions.
This research was funded by the Media Lab Consortia.
Speed Dating with Voice User Interfaces (VUIs): Understanding How Families Interact and Perceive VUIs in a Group Setting
As voice-user interfaces (VUIs), such as smart speakers like Amazon Alexa or social robots like Jibo, enter multi-user environments like our homes, it is critical to understand how group members perceive and interact with these devices. VUIs engage socially with users, leveraging multi-modal cues including speech, graphics, expressive sounds, and movement.
The combination of these cues can affect how users perceive and interact with these devices.
Through a set of three elicitation studies, we explore family interactions (N = 34 families, 92 participants, ages 4–69) with three commercially available VUIs with varying levels of social embodiment.
The motivation for these three studies began when researchers noticed that families interacted differently with three agents when familiarizing themselves with the agents and, therefore, we sought to further investigate this trend in three subsequent studies designed as a conceptional replication study.
Each study included three activities to examine participants’ interactions with and perceptions of the three VUIS in each study, including an agent exploration activity, perceived personality activity, and user experience ranking activity.
Consistent for each study, participants interacted significantly more with an agent with a higher degree of social embodiment, i.e., a social robot such as Jibo, and perceived the agent as more trustworthy, having higher emotional engagement, and having higher companionship. There were some nuances in interaction and perception with different brands and types of smart speakers, i.e., Google Home versus Amazon Echo, or Amazon Show versus Amazon Echo Spot between the studies.
In the last study, a behavioral analysis was conducted to investigate interactions between family members and with the VUIs, revealing that participants interacted more with the social robot and interacted more with their family members around the interactions with the social robot.
This paper explores these findings and elaborates upon how these findings can direct future VUI development for group settings, especially in familial settings.