Using Speech to Reduce Loss of Trust in Humanoid Social Robots*

Amandus Krantz

^{1}

, Christian Balkenius

^{1}

, and Birger Johansson

^{1}

* This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program - Humanities and Society (WASP-HS) funded by the Marianne and Marcus Wallenberg Foundation and the Marcus and Amalia Wallenberg Foundation.

^{1}

Lund University Cognitive Science, Department of Philosophy, Lund University, Lund, Sweden {amandus.krantz | christian.balkenius | birger.johansson}@lucs.lu.se

Abstract

We present data from two online human-robot interaction experiments where 227 participants viewed videos of a humanoid robot exhibiting faulty or non-faulty behaviours while either remaining mute or speaking. The participants were asked to evaluate their perception of the robot’s trustworthiness, as well as its likeability, animacy, and perceived intelligence. The results show that, while a non-faulty robot achieves the highest trust, an apparently faulty robot that can speak manages to almost completely mitigate the loss of trust that is otherwise seen with faulty behaviour. We theorize that this mitigation is correlated with the increase in perceived intelligence that is also seen when speech is present.

I Introduction

As robots become integrated into our society and begin taking on a more social role by entering our homes and workplaces, understanding what it is that makes us trust those robots becomes increasingly important. Even more so, it is important to understand how trust in robots is lost and how this loss can be mitigated.

Trust and loss-of-trust mitigation in human-robot interaction (HRI) is often approached from a performance perspective [1]. However, many theories of trust point out that trust also has a social component (See e.g. [2, 3, 4]). This social component is based more on a feeling of safety and comfort, rather than purely on rational reasoning about the system’s past performance. Understanding how this more social component of trust behaves in HRI scenarios and how the behaviour of a robot may impact it is still a relatively new endeavour when compared to the more traditional performance-based trust.

Previous HRI studies have found that loss of trust or affection towards a robot that has made a mistake can be mitigated by making the robot appear more social by giving a verbal explanation of why the error happened [5]. However, unlike with most neurotypical humans, the ability to speak is not a given for robots, which more often than not are completely mute. How trust in robots is affected by the ability to speak, without necessarily providing explanations for errors, is to our knowledge still an unexplored area of research.

We designed two online independent measures experiments where participants were asked to view a video of a robot exhibiting one of two different behaviours, and afterwards evaluate their perceptions about the robot. Experiment 1 aimed to investigate how faulty and non-faulty gaze behaviours impact trust in HRI. The results of the experiment were ultimately inconclusive, showing no difference in trust between the two conditions. As faulty behaviour has been shown to affect perceptions of robots [6] and negatively impact trust in HRI [7], we theorized the cause to be the fact that a portion of the experiment involved the robot “speaking”. A follow-up experiment, Experiment 2, was thus performed, recreating Experiment 1 as closely as possible, but without the speech portion, this time achieving conclusive results.

This paper thus presents data from these two HRI experiments that together may shed some light on how the ability to speak may impact the perception of a robot in HRI.

I-a A note on online HRI experiments

While it may be difficult to convey some subtler elements of HRI using online studies, it is still a commonly used approach and gives access to a much larger and more diverse group of potential experiment participants compared to live-HRI experiments. At the very least, the results from such studies can be used as guidance for experiments that may be worth replicating in live HRI studies [5].

Ii Methodology

Ii-a Participants

The experiments were done with a total of 227 participants, 110 in Experiment 1 and 117 in Experiment 2. They were recruited from the online participant recruitment platform Prolific (prolific.co). Participants were required to be fluent in English and naïve to the purpose of the experiment (i.e., participants from Experiment 1 could not participate in Experiment 2), but otherwise no pre-screening of the participants was done. The mean age of the participants in Experiment 1 was 27 years (SD 7.73; range from 18 to 53), in Experiment 2 it was 39 years (SD 15.84; range from 18 to 75). In Experiment 1, the distribution of genders was 49.1% identifying as male, 50% identifying as female, and 0.9% preferring not to say. For Experiment 2, the distribution of genders was 53.3% identifying as male, 46.7% identifying as female, and 0% preferring not to say.

All participants were required to give their consent to participating in the experiment before beginning.

Ii-B Robot

Figure 1: Epi, the humanoid robotics platform used in the experiment.

The experiments were done using the humanoid robot platform Epi (See Figure 1), developed at Lund University [8]. The robot’s head is capable of playing pre-recorded smooth and fluid movements with 2 degrees of freedom (yaw and pitch), and has a speaker built into its “mouth”. The eyes of the robot also have 1 degree of freedom (yaw), adjustable pupil size, and adjustable intensity of its illuminated pupils. Only movement of the head and the speaker was used for the experiments.

Ii-C Experiment set-up

(a) Gaze positions of non-faulty gaze behaviour.

Both experiments had a between-group design, where each participant was assigned to one of two conditions. Each condition had an associated video that the participants were told to base their evaluation of trust on. The videos showed the robot exhibiting either faulty or non-faulty gaze behaviours.

In the non-faulty gaze behaviour (See Figure 1(a)), the robot starts looking into the camera. When an object is presented to the robot, the head moves until it appears to look at the object, holds the position for roughly 1 second, and moves back to its starting position, looking into the camera.

In the faulty gaze behaviour (See Figure 1(b)), the robot again starts looking into the camera. When the object is presented, the head moves in a random direction, rather than in the direction of the object. We chose this behaviour over having the robot remain static, as it was important that the robot appeared to have the same capabilities in all conditions. All other behaviour in the conditions with faulty gaze-behaviour is identical to the non-faulty behaviours.

In Experiment 1, once the gaze behaviour had been displayed, the robot would play a pre-recorded audio file of a computerized voice presenting a number of facts about the object that had been displayed. The speech makes no reference to whether or not the robot display a faulty or non-faulty behaviour.

Care was taken to ensure that the robot’s speech never overlapped with the movement of the head. All behaviours exhibited by the robot were pre-recorded and no autonomous behaviours were implemented.

Ii-D Experiment scenario

To avoid any observer effects, it was necessary to give the participants a scenario for which to judge the trustworthiness of the robot. As the purpose of Experiment 1 was to examine the effect of different gaze behaviours, the scenario was that the robot was being developed for a classroom setting, that its purpose was to answer children’s questions, and that it was different voices we were comparing.

Experiment 2 had no speech component, so the participants were instead told that the robot was reporting which objects it was seeing to an unseen operator.

All participants were debriefed and told the real purpose of the experiment after completion.

Ii-E Measures

Due to the dynamic nature of trust [9, 10], we measured the amount of trust the participants felt towards the robot twice; before and after the interaction. For the pre-interaction measurement, the participants evaluated the trust based on a static image of the robot (See Figure 1). For the post-interaction measurement, they based their evaluation on one of the previously described videos. The trust relation was measured using the 14-item sub-scale of the TPS-HRI scale, developed by Schaefer et al. [11]. The scale outputs a value between 0 and 100, where 0 is complete lack of trust and 100 is complete trust.

The Godspeed scale [12], specifically the Perceived Intelligence, Likeability, and Animacy sub-scales, were used to measure the participants’ impressions of the robot after the interaction.

To control for any negative feelings the participants may have harboured towards robots before the experiment, the Negative Attitudes Towards Robots Scale (NARS) was used [13]. NARS gives an overview of both general negative feelings towards robots, and three sub-scales for negative feelings towards interaction with robots, social influence of robots, and emotions in robots.

Since robot experience has been shown to affect feelings of trust towards robots [14], we also asked the participants how often they interact with robots and autonomous systems on a 5-point scale, where 1 was daily interaction and 5 was rare or no interaction.

Iii Results

Iii-1 Trust

Figure 3: Box plot of differences in trust before and after interaction.

In Experiment 1, no significant difference was found between the faulty and non-faulty conditions (Mann-Whitney U, $p = 0.179$ ). However, once the speech of the robot was removed in Experiment 2, a significant difference was found (Mann-Whitney U, $p < 0.05$ ). Looking at the box-plot of differences in trust in Figure 3, this difference seems to be due to the faulty behaviour reducing the trust, rather than the non-faulty behaviour increasing the trust. No significant difference was found between the non-faulty conditions in Experiment 1 and Experiment 2 (Mann-Whitney U, $p = 0.230$ ).

Iii-2 Perceived characteristics

Experiment

Condition

Animacy

Likeability

Perceived

Intelligence

Experiment 1

Non-faulty

2.903

4.011

4.135

Faulty

2.815

3.764

3.927

Experiment 2

Non-faulty

2.464

3.331

3.311

Faulty

2.244

2.971

3.036

Table I: Mean scores from the Godspeed questionnaires for Animacy, Likability, and Perceived Intelligence.

Mean scores from the Godspeed questionnaires for Animacy, Likability, and Perceived Intelligence can be found in Table I. Cronbach’s Alpha with a confidence interval of $0.95$ for all Godspeed questionnaires were in the $0.7 - 0.9$ interval, indicating acceptable to good internal consistency. Both conditions from Experiment 1 rank higher than Experiment 2 in all measured characteristics.

Iii-3 Negative attitudes towards robots

Figure 4: Kernel Density Estimate of NARS and its three sub-scales. A lower value indicates a more negative attitude. Red is Experiment 1, blue is Experiment 2.

Figure 4 shows the Kernel Density Estimate of NARS and its three sub-scales. The full NARS scale and the two sub-scales S2 and S3 are roughly normally distributed, indicating that the participants had overall neutral feelings towards robots before starting the experiment. The sub-scale S1 skews slightly lower, indicating that the participants had slightly negative feelings towards social situations and interactions with robots.

No significant differences can be seen in negative attitudes between the two experiments.

Iii-4 Participants’ experience with robots

Frequency	Experiment 1	Experiment 2
Daily	40%	41%
Once a week	30%	22.2%
Once a month	13.6%	14.5%
Once a year	8.2%	11.1%
Never	8.2%	11.1%

Table II: Proportions of how frequently the participants in either experiment interact with robots, AI, and other autonomous systems.

The participants in either experiment interact with robots, AI, and autonomous systems with roughly equal frequency (See Table II), with the majority interacting with such systems daily.

Iv Discussion and Conclusion

The combined results of the two experiments show that, if the robot behaves in a non-faulty manner, unsurprisingly, trust in the robot remains largely unaffected, regardless of whether it can speak. However, once the robot is perceived as being faulty, having the ability to speak seems to reduce the resulting loss of trust, making the faulty robot appear about as trustworthy as the non-faulty ones. According to the results from the Godspeed questionnaire, the speaking robots were also perceived as being more animated, likeable, and, notably, as possessing significantly higher intelligence than the non-speaking robots. This could be an indication that, for humanoid robots, the ability to speak is perceived as a sign of high intelligence. Alternatively, the speaking robot may appear to be more sophisticated or be more capable than the non-speaking robot. Both high perceived intelligence and high capability are believed to have some correlation with a higher trust [10].

Regarding participant-centric characteristic that may affect the trust in the robot, we controlled for mean age, gender distribution, pre-existing negative attitudes towards robots, and participant experience with robots and other autonomous systems. Of these characteristics, only age differed significantly between the two experiments, with the mean age being 12 years higher in Experiment 2. While age has been shown to have an impact on attitudes towards technology, with older people having a more negative attitude [15], the negligible difference that was seen in the distributions of the NARS scores (Figure 4) indicate that the difference in mean age between the experiments is likely not large enough to affect the results.

As ever, there are some limitations that should be kept in mind when using these results. First, as mentioned, the experiments were done online using pre-recorded videos of the robot rather than direct human-robot interaction. The large amount of available participants should safeguard against false positives, however a live-HRI study may nevertheless yield different results.

Second, the experiment scenario was different between the two experiments, with participants in Experiment 1 being told that the voice was the focus of the study. This could potentially have caused participants to ignore the gaze behaviour of the robot and focus solely on its voice, which was the same across the conditions. A follow-up study is planned to investigate this possibility.

Finally, the content of the robot’s speech was not controlled for. It is conceivable that some part of the speech is signalling to some participants that the robot is highly capable or intelligent, causing the trust to increase.

In summary, this paper presents results from two experiments in HRI that together suggest that a humanoid robot with the ability to speak may not suffer the same loss of trust when displaying faulty behaviour as a robot without the ability to speak. We theorize that this effect is due to speech increasing a humanoid robot’s perceived intelligence, which has been shown to correlate with trust in HRI [10]. Further research along these lines may help explain existing studies in HRI (See e.g. [5]) that indicate that a robot providing a verbal explanation for its errors is beneficial for user attitudes.

References

[1] P. A. Hancock, D. R. Billings, K. E. Schaefer, J. Y. C. Chen, E. J. de Visser, and R. Parasuraman, “A meta-analysis of factors affecting trust in human-robot interaction,” Human factors, vol. 53, no. 5, pp. 517–527, Oct. 2011.
[2] S. T. Fiske, A. J. C. Cuddy, and P. Glick, “Universal dimensions of social cognition: Warmth and competence,” Trends in Cognitive Sciences, vol. 11, no. 2, pp. 77–83, Feb. 2007.
[3] S. P. Marsh, “Formalising trust as a computational concept,” Ph.D. dissertation, University of Sterling, Apr. 1994.
[4] D. J. McAllister, “Affect- and Cognition-Based Trust as Foundations for Interpersonal Cooperation in Organizations,” Academy of Management Journal, vol. 38, no. 1, pp. 24–59, Feb. 1995.
[5] D. Cameron, S. de Saille, E. C. Collins, J. M. Aitken, H. Cheung, A. Chua, E. J. Loh, and J. Law, “The effect of social-cognitive recovery strategies on likability, capability and trust in social robots,” Computers in human behavior, vol. 114, Jan. 2021.
[6] M. Salem, F. Eyssel, K. Rohlfing, S. Kopp, and F. Joublin, “To Err is Human(-like): Effects of Robot Gesture on Perceived Anthropomorphism and Likability,” International Journal of Social Robotics, vol. 5, no. 3, pp. 313–323, Aug. 2013.
[7] M. Salem, G. Lakatos, F. Amirabdollahian, and K. Dautenhahn, “Would you trust a (faulty) robot? Effects of error, task type and personality on human-robot cooperation and trust,” in 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Mar. 2015.
[8] B. Johansson, T. A. Tjøstheim, and C. Balkenius, “Epi: An open humanoid platform for developmental robotics,” International Journal of Advanced Robotic Systems, vol. 17, no. 2, p. 11, Mar. 2020.
[9] K. Blomqvist, “The many faces of trust,” Scandinavian journal of management, vol. 13, no. 3, pp. 271–286, Sept. 1997.
[10] E. Glikson and A. W. Woolley, “Human trust in artificial intelligence: Review of empirical research,” Academy of management annals, vol. 14, no. 2, pp. 627–660, Mar. 2020.
[11] K. E. Schaefer, “Measuring trust in human robot interactions: Development of the “trust perception scale-HRI”,” in Robust Intelligence and Trust in Autonomous Systems, R. Mittu, D. Sofge, A. Wagner, and W. Lawless, Eds. Boston, MA: Springer US, 2016, pp. 191–218.
[12] C. Bartneck, D. Kulić, E. Croft, and S. Zoghbi, “Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots,” International Journal of Social Robotics, vol. 1, no. 1, pp. 71–81, Jan. 2009.
[13] D. S. Syrdal, K. Dautenhahn, K. L. Koay, and M. L. Walters, “The negative attitudes towards robots scale and reactions to robot behaviour in a live human-robot interaction study,” Apr. 2009.
[14] K. Rogers, D. Bryant, and A. Howard, “Robot gendering: Influences on trust, occupational competency, and preference of robot over human,” in Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, ser. CHI EA ’20. New York, NY, USA: Association for Computing Machinery, Apr. 2020.
[15] M. M. A. de Graaf and S. Ben Allouch, “Exploring influencing variables for the acceptance of social robots,” Robotics and Autonomous Systems, vol. 61, no. 12, pp. 1476–1486, Dec. 2013.