SimOnAir Ep. 2 – Talking to Machines (Cathy Pearl)
In this 2018 episode, Sim explores conversational AI with Cathy Pearl, author of “Designing Voice User Interfaces” and Google's Head of Conversation Design Outreach. They discuss the complexities of crafting conversational AI, the progression of voice technology, and its influence on our everyday lives.
Cathy Pearl, a specialist in conversation design and voice user interface (VUI) design, is the author of “Designing Voice User Interfaces”. She currently holds the position of Head of Conversation Design Outreach at Google. With over a decade in the conversation design field, Cathy focuses on creating AI experiences that are more empathetic and human-centered. More about Cathy can be found on her website.
- Cathy talks about the progression of voice technology and its role in making technology more accessible.
- She offers insights into the triumphs and challenges of developing conversational AI that can comprehend and cater to human needs.
- The discussion delves into the amusing aspects of conversational AI, including unintentionally funny responses and the inherent silliness of conversing with a machine.
- Cathy also contemplates the future of conversational AI and the potential it offers, such as alleviating loneliness and providing companionship.
Links and resources
Read full transcript
Sim (host): Cathy, correct me if I’m wrong, but this is how I see it. With Alexa, Siri, or Chatbots, we are still using a computer but instead of a mouse and a keyboard, we use our voice to give a command, to give an input and instead of a display showing windows or buttons, the computer shows us the results, gives us the results by talking back to us. And it becomes a conversation back and forth, which is both, in this case through voice, our display and our keyboard at the same time. And you used to design those conversations, just like other designers work on buttons and windows for text, right?
Cathy Pearl (guest): Right. So I’ve been spending the last 20 or so years working on these voice user interfaces and different modalities. I started with phone systems, IVR as we call them, when you call up the phone system and talk to the computer. And then I moved on to multi modal apps and more recently at Sense.ly, an avatar based conversation.
Okay, an avatar, so it’s like if Siri had a face.
Yeah we have a… we have a bunch of different avatars and basically you can speak to the avatar, the avatar speaks back. You can also text with the avatar. But it’s a slightly more engaging way for something like healthcare, which is a very important place to have engagement, to get people more involved.
Of course, yeah. And also, I guess you need to have a specific level of empathy and…
… and care for those kind of user.
It can add to the empathy with those patients who may be suffering from difficult chronic conditions, it helps them.
When is the first time that you tried to talk to a computer? That you’re like, “I want this computer to talk to me,” and you talk to it in order to make it.
Right. I would say it was when I was a… when I was a kid. And my dad bought our family a computer when I was 8, a Commodore VIC-20 which had 5000 K of memory in it.
Oh my god! All together?
Yes. And I really fell in love with computers right away. And I was the one in the family who learned how to program. The… I can still remember sitting there with the… the user manual that came with a computer, it was so wonderfully done on telling you how to do it. I want to find the people who wrote that user manual and say what a great job they did. But when I was maybe 10 or… 10 or so, there were TV shows like Knight Rider that had a talking car, there was WarGames, the movie that had the talking computer, and I was very interested in that. And so I tried to write a little chatbot on my Commodore VIC-20 because I wanted the computer to talk.
What did you… what did you chat about? What were the questions that it was able to answer?
Well, it could chat about whatever you put into it. So it was very programmatic. So…
And in your case?
In my case, I would type things like, “How are you?” and it had 3 possible responses that it would give you. And if it ever… if you type something that didn’t understand, it would say, “I didn’t understand that, but give me 3 responses and I’ll do that the next time somebody asked me that.”
So it was able to learn that way.
Yeah, I was about to say it.
Build upon that knowledge base, yeah.
Yeah, it was pretty advanced. How old were you, you said?
Probably about 10 or so.
Growing up, how did you keep and kept cultivating this passion?
I just… I did a lot of programming in the language called BASIC. We had a magazine called Computer Gazette that I would sit and type in the programs that were in this magazine and just literally sit there and type and run them. Because of course, there was no internet, there was no way to download a program; you could buy the disk. But… so I spent a lot of time typing programs in and learning programming in that way because I didn’t really have a lot of opportunities at school to learn computers, so I had to just kind of teach myself.
You… you go to high school, and after high school, what do you do?
So I loved programming but I did not want a major in Computer Science. So I went to UC, San Diego and they offered a discipline called Cognitive Science, which I’d never heard of before. But I ended up majoring in that and I loved it. It was this wonderful interdisciplinary thing of neuroscience, linguistics, psychology, artificial intelligence, and I just learned a ton of stuff. I minored in Computer Science and artificial intelligence and I did… still did some programming but the Cog-Sci classes were fascinating. I find it very interesting to try and learn, “How do people work? Why do they do the things they do? What are… why are they attracted to things they’re attracted to?” And Cognitive Science gave sort of a little bit of… slightly cracking open the door of trying to understand humans.
Once you found out that this was a passion for you and you graduated from it.
There were very few Cognitive Science graduate programs. So I ended up getting… going into the PhD program for Computer Science in Indiana University because they had a Cognitive Science specialty. But after my first year as a Computer Science graduate student, I realized that this was not the life for me. I was not really that interested in the hard core, you know, analysis of algorithms and theory of complexity classes. And I thought, “Why am I getting a PhD anyway? I don’t want to be a professor.” And so I decided to get my masters instead. And I shifted my focus to human-computer interaction classes, things like that, did my thesis in that area and…
What was it about?
It was about comparing graphical user interfaces to typing a natural language interfaces. So like opening files and doing simple computer tasks by typing rather than a GUI. And of course it didn’t exist yet so I had to do like a Wizard of Oz scenario where I was in the next room, I hooked up a cable between the 2 computers so I could open windows and open programs and things like that. So I was faking it.
(Laughs). There was a machine going on behind… (Laughs).
Yes, exactly, right. That was the wizard behind…
This computer is intelligent and you’re moving, oh…(Laughs).
Actually what surprised me was that I thought people would be like, “Wow! I didn’t know computers could do that!” But nobody seemed all that surprised.
All your effort they’re like, “Okay, it’s doing it.”
Yeah! Like, “Sure! Sure it could,” I don’t know.
“That’s a computer, it can fly us to space.”
So what you did basically was… now I can do with some calendar apps on my phone, for example, in fantastical, I can go like, “Alright, I have this appointment in a couple weeks on a Tuesday at 9:00,” and it’s there.
I write the sentence or I say it to Siri and any translates into an actual event in my calendar.
That’s what your… this was about in which year?
That was in 1997.
Okay, that was advanced. (Laughs).
Well, it didn’t actually work, I was just testing if it could work.
Well, you know…
And would people like at some point?
It happened. And so you do that and… and then in the late 90s, you start to work at Nuance; am I pronouncing it?
First, I went to… right out of graduate school, I went to work at NASA Ames here in California. They had a job opening for… I was writing code for a helicopter pilot simulator.
They were looking at cognitive load, things like, “If there’s alarms going off, can the pilot reach all the buttons? Can they… will they get confused?” So I spent a couple years writing software for that; so nothing to do with natural language processing or speech recognition.
But I… it was a neat project but we didn’t really have any customers. And I realized that something that was really important to me was to work on a product that real people would really touch and use. So I started looking for other opportunities and Nuance Communications, which is a company that’s still around today that does speech recognition, they had a job opening and they had a demo line. And I thought, “Well speech recognition, that doesn’t work.” But I called this thing on the phone and you could do this little demo where you move money from your checking account to your savings account. It was like, “Wow! This is… this is really cool!”
And then it was after that that I started it Sense.ly with the… the virtual nurse avatar and applying speech recognition and natural language processing to that.
Because actually speech recognition is one of the tools that allow you to design conversational user interfaces. So you wanted to have a broader perspective on it and work on a more… on a more wide approach… a wider approach.
And… and think… and put to work your cognitive…
Your Cog-Sci, yeah. (Laughs). Your minority report.
Your (unclear)[10:02]… okay
Exactly. So that… that really changed my thinking and also the introduction of the Amazon echo totally changed my thinking as well. Again, because it’s a tool that allows you to do things…
Why the echo and not like…? Well, Siri was the first digital assistant, right? But why do you say that the echo is the one that opened up more your perspective on this topic?
To me, it’s because of the frictionless nature. It’s the fact that, I mean, there’s… there’s 2 parts of technology that that were newer and we didn’t do before. One is the microphone technology. It works…
Yeah, the microphone array that they have, it works so well. I can be sitting on my couch in the other room and I can call out, you know, “Alexa, what time is it?” or whatever, and 99 % of the time, I will get a response. And so it’s…
**And 1%, you get a laugh.
Have you gotten the laugh?
That hasn’t happened to me yet, but I’m waiting; I’m ready. Whereas with Siri or… or the other phone assistants, it’s a phone, the microphones just aren’t as good. I mean, they’re getting better of course, but you just… half the time, you’re just not even going to get a response. And the other part of it is the fact that the speech recognition itself has gotten so much better. So when I first worked at Nuance, we had to handcraft every single grammar of everything you might think someone would say. So if I just had a yes/no question, I would have to literally write out “uh, um, yes,” or, “yeah,” or, “yep,” followed by, “please,” optionally… I mean, you would literally be writing every single possible word.
And how does it work now?
And now what you can do is, because the recognition is so good, you can just get a stream of what is being read… every word that’s coming out of one’s… the users mouth and then you can start extracting things.
Yeah, you can look for keywords. You can do lots of different things like that. So I don’t have to constrain myself to such a tiny, you know, hand-crafted way to do thing. So those 2 things together, the microphone array plus the speech recognition… and has made this such a frictionless experience for me. Everything I could do on my echo or my home, I could do on my phone, but I do not. I don’t go and pick up my phone and unlock it when I’ve got my echo sitting over here in the corner of my home. I just turn to it and I talk to it and it’s so simple that it makes a huge difference.
So yeah… although sometimes something that makes me wonder why it’s not possible for these devices to remember. And you discussed this in your book a little bit. But I was always wondering, “Really? I asked you like movie times, 5 minutes ago. Like if asked you directions to the movie theater and you know the movies 1 hour, well, why do I need to specify the address or the name they?” They have like the memory of a goldfish.
And I feel that if they were able to remember… and why is it so hard to create this kind of process of like memory and context and..?
I think you’ve… you’ve hit on one of the most important things were missing right now, that that is our next step in terms of this technology. I think for humans, it’s so frustrating because even a toddler can distinguish like when you say, you know, “Go get me the red ball out of the green box,” and toddler knows you mean bring me… that ‘it’ means the ball, not the box. Or when I say, “Please repeat that,” it’s so easy for humans to know exactly what I’m talking about, but it’s hard for computers. And I think there’s 2 aspects to it. One that really is technically… it… language, some of the stuff we take for granted, really is complicated; like, figuring out how to resolve references like ‘it’ and ‘they’ and ‘them’ and ‘that’. Our brain is doing a whole lot of work there. And it is… takes effort and… and it is hard to build that. But we… but we can and we’ve started to. Google home especially, I think is doing a good job at this.
Well, it was funny because as I was working on my book, I was testing certain things. And over time I could see that examples that I was going to perhaps use as a failure in my book, when I would do it 2 months later to make a screenshot or write it down exactly, I’d find they had fixed it. Like the phrase, “Please repeat that,” there was a time when please repeat that went off into this Wikipedia, you know, description of the phrase, “Please repeat that.” And now if you say it on both echo and home, most of the time they will actually realize you mean, “Say the thing again.”
So they’re… they’re really are… (unclear)[14:11].**
I think, honestly, the other aspect of it has a lot to do with the fact that these devices and Siri were set up to be a command and control like one-off thing, like, “Turn on the lights, set a timer, play a song.”
That’s what I use them for mostly.
And a lot of developers especially are not thinking about, “What could happen next? What might a human do next?” I use the… the Google Calendar example all the time where if say, “What’s on my calendar?” and it says,” Oh, you have a dentist’s appointment.” And then I say, “How long will it take to get me… how long will it take to get there?” And a lot of times… it used to be… and Google’s actually fix this, it used to be, like you said…
“Here’s the web results for…”
Well, no, not even that! It would be like, “I can’t help you with that.”
It wouldn’t even have any… it’s like every single turn is a brand new conversation; like you were saying, the memory of a goldfish. But they’ve started to expand because in my mind, sure, we’re not yet at the point technically where you could say anything at any time. But within a domain, like if I just 30 seconds ago asked about my calendar, and then I say, “it,” or, “that,” or, “there,” I’m probably talking about the calendar. You know, you can constrain it and such that we can technically handle that concept. So I think part of it was the fact that developers often… because a lot of these things are designed by developers, not necessarily designers.
**Not by people.
And I used to be a developer myself so I can say this. But… so many of these were designed to be one offs but that was when everyone else was building.
Of course, it was original.
That’s just what was happening. And now people are starting to realize, “Oh yeah, this conversation could continue, so let’s think about the follow-up experience.”
It can become a conversation more than like, enunciating commands to a device.
**We use pronouns that we would use for human beings, especially ‘he’ or ‘she’; especially ‘she’. But we say ‘she’, Alexa is ‘she’, Siri is ‘she’, even though we can set a male voice even with accents; not Italian yet.
But we can. So how… why do you think that people refer with pronouns… with living being pronouns to these devices? Because they’re… the way that Amazon is marketing, for example, during the Super Bowl, the commercial, they really want you to think of Alexa or of Siri as a family member.
That is the narrative that is being pushed. But why do you think that people immediately… how do… what does a designer think? Does a designer want people to consider them as an entity?
For sure. I mean, these are deliberate choices that these companies are making to choose a female persona, a female name. That’s very deliberate. I think it’s unfortunate; to me it’s very stereotyping.
Tell me that I’m wrong. In my mind, it is because a woman’s voice can be more reassuring and maternal and less threatening. And if you get a man that… (Laughs)… it doesn’t have the same effect. So is that it? Do you think that’s it? What is the reason behind the feminine, female characterization?
I think it’s a lot to it. So when, you know, when we think of secretary, as the stereotype is assistant.
It’s a… it’s a demure, helpful, non-threatening woman.
And so I think a lot of people thought, “Oh it’s an assistant.” So what’s my idea of an assistant? What pops into your head first? “Oh, a demure woman.” So I think that was just the thing that it started and now it’s kind of like, that’s what everybody… not everybody, but most people do. And, you know, when I was working on IVRs, we would always have a discussion with a client, they would say, “Should we use a male voice or a female voice?” for their phone system. Because at the time, we didn’t use computer voices, text-to-speech; it wasn’t good enough. We used voice talents that we brought into the… the recording booth and we would do voice coaching. So we always had voice talent tryouts and things like that. And it was always this difficult dance with the client because they would say, “Well, a woman’s voice is much more caring.” We would say, “Well, no, no, it’s… it’s the perception. It’s… it’s the voice itself. There can be female voices that sound carrying, there can be male voices that sound carrying, there can be…” you know?
Of course, yes.
And so I… I would love it if we could get away from this idea that these assistants always have to be, by default, female.
Because I’m pretty sure we’re paying female assistants less than male assistants, right?
Exactly. So they are making a lot less money. So I hope that as time goes by and we get more computer voices and things like that, that it will not be the default that’ll just be, “Oh, it’s a woman.”
Maybe you might have a choice. Because, you know, all the menu choices that you have when you set up these devices, it doesn’t ask you if you want like a male or female voice, I think.
Yeah. Now for Siri, I know you can… you can have a male or a female voice; so some of them do offer that. There’s oh there’s that famous story of BMW or one of the first GPS systems had a female voice and all these men complained and said, “I won’t take driving directions from a woman!”
And they changed it to a male voice. I mean, there’s so much…
**Maybe we should have them crash.
There’s so much about being human that we already… we all have these built-in, you know, unconscious bias. We all… we all have that, we all know about that. And I think it’s contributed greatly to this choice to have some of these assistants… interestingly though, some of the first robots that are coming out like Jibo and Curie, those are male. And I’m wondering, “Does a physical…
“… one more likely to have a male voice like a Butler?” or like why? You know, so who knows? It’s all human psychology.
And yeah, you’re right. But that is interesting. It would be interesting to… to look at the reasons why this happens for physical robots an embodiment of that. Maybe it’s because a robot is… can be stereotypically strong; that’s the thing.
I think they’re very small and cute. So.
They are, they are.
They’re not like lifting stuff or…
**(Laughs) It’s… for sure, voice can carry a persona. And that’s what also conversational designers think about, right; to… to create a persona for… for the user. So to express… can you specifically explain why a persona not a character? You explain that a persona is more like a… it doesn’t necessarily have all the trace of a character in a script, but it can still express some kind of attitude towards… and maybe… I think about Siri as very sarcastic in a smarter way; Alexa too. I don’t know if you had the chance to use these assistants in another language, but Google assistant in Italian makes pretty sad jokes.
I think every time I hear a joke by a digital assistant, I think like, “That’s… that is the… that is the engineer level of joke that… that we are all going to have to enjoy.”
The problem is, how do you write a joke for everyone? Because if you go too far off, you’re going to probably offend people. Maybe they’ll be like 10% of your audience who loves it; I mean, you’re in comedy. And then, you know, 20% are super offended. Or do you go with the one that offends nobody but also isn’t that funny?
Isn’t that an interesting thing to think about in general? Because you just talked about different audiences and groups and communities. Because a specific… say minority okay or an ethnic group or a field, like specific… people specialized in a specific field or workers, they all have a different language, they all have a different like sense of values for a conversation and customs and that kind of things. And one of the reasons why these assistants need… need to be for everyone, like you said, but that’s one of the elements that makes you feel a little bit detached to me. Like it’s part of it but, are we going to have one day like an assistant that can like be closer to my way to relate to the world, to my field, to my community and learn eventually a slang?
Like that that could be… that, I think, that’s a problem that needs to be also solved.
I think there’s a… so I think, short term, one of the ways we can express more personality is, you know, echo and home have all these skills or actions. And so if you deliberately choose like, “I want the whatever trivia about such-and-such,” you could put specific humor or reactions in there. Because, you know, somebody chose that. They’re not just asking for the time, they’re saying, “I want to go do this specific experience.” And you could… you could ramp up the personality traits. You could go a little more offbeat with your humor and like that. But as far as the full system, I mean, I think one of the things you’re making me think of that is mirroring, which is, when we speak to somebody, we often start to mirror the way they… they speak. And you see that… you see that speed-dating and all kinds of stuff. And I think as, you know, who knows how many years down the line this will be, but your systems will hopefully be able to pick up on the way you speak. And so if they know you use a lot of sarcastic jokes, they would know that they could too. They could… it’s a safe space to use sarcastic jokes because you use sarcasm or whatever. And so, I mean, there’s… there’s the more manual setting where you could say, “I want my sarcasm level 6,” or whatever. But that’s tough.
So maybe mirroring could be a way where it says, “I’ve seen the way you speak or the things you talk about and I know what topics I can…”
You usually care about. For example, you can say like, “Sim is crazy because he loves comedy, but he also loves like user interfaces. And so I’m going to make jokes about him.” (Laughs).
Exactly. It could get smart enough to do that.
I think my Siri already makes fun of me because, you know, with an accent, using anything like speech recognition and assistance, it’s a hit or miss. A few weeks ago I was… I tend to do stuff like… I use very specific commands now. It’s just, I don’t even bother using a neutral language. I know the key words that are like, “Alexa, lights a hundred.”
Oh, so you go back to the DOS command line, you know?
**Yeah, basically I speak like a command line. But also, I was adding… I wanted to buy olive oil. So it was like, “Hey Siri, add olive oil to the groceries list on things.” And she goes, “Sure, I added holy oil to the groceries.”
You can get that at (unclear)[24:47].
**Yeah. (Laughs). So I just went to the church and I got some.
I used to get… back at Nuance, one of my office mates used Dragon Dictate, which is a dictation program to send emails sometimes. And I would totally know I got an email from him dictating when I’d see certain words and I’d be like, “That word is not…” And I’m like, “Oh, he was dictating!” and now I understand. But I think they’re sort of 2 problems still to solve. One of them of course is, like you said, with accent. A lot of the problem is just certain languages have less data. And so we… it falls down more with… with those dialects and those differences. So that’s going to be a matter, I think, of just continuing to collect more and more data. But the other part of it isn’t a speech recognition problem, it’s the natural language problem. And it has to do with something like my mantra which is designed for how people actually talk, not how you want them to talk which is… a lot of times we designers, developers get very caught up in how we think people will respond to our prompts. And we’re like, “It’s so obvious. When I say this, they’re going to say this, boom-boom-boom, we’re done,” and then you go look at the logs and you realize, “Oh wow! People speak…” There’s so many different ways for people just to do something as simple as order a tuna salad. I mean, it’s just like, you’d think it was so straightforward but we have such variation. And we as designers and developers need to remember that. And so, when I have… when I have a prompt that asks you… or when you were talking about turning on… I think you were talking about turning on a camera or security camera.
Oh I do, yes, we were talking about it.
And it needs to be able to handle the variety of ways that someone will ask for that thing and not just expect an exact set of key words to perform. And that can be done… the technology is there. That… that one’s about design and making sure we understand a wide breadth of ways people speak.
Yeah, I usually do that. I go… I called it security camera; the device. And sometimes I just go like, “Hey, turn the camera on, turn the camera off after I turned the lights on or off.” So in my mind it’s like, “That’s the security camera,” right? But if I say, “Turn the camera on or off,” Siri goes like, “Here is the cam…” like it opens the camera to snap a picture. And I’m like, “No that’s not…”
Context. Yeah, it should know what…
**“What are you doing Sir?”
Yeah, there is that. Another thing that I personally… it’s a pain for me not being able to speak 2 languages at the same time. But I realized that I need… we need to wait for that maybe. But…
Oh, so you want to speak to your… let’s say, Siri or echo in… sometimes in Italian and sometimes in English?
And dictation because, you know, my phone is in English and I use it in English, but sometimes like text to people in Italian.
And I believe I’m not the only person in the world that speaks 2 languages. And I realize that it’s not a priority. That would be such an easier life because I do not have to choose to use an assistant exclusively in one language for me.
That’s a good point.
You know, it would make it easier if I go like, “Hey, Siri send a message to my mom saying, ‘Ciao come stai? Lo sono qui,’” it would… it would be easier.
Just putting it out there. (Laughs).
That makes a lot of sense.
So how is creating this persona? Do you think voice and conversational user interface designers will need help from other fields eventually, not technical, like from liberal arts or this kind of thing? How… how could those help?
For sure. And I think all the major companies now have writing teams that draw from writers from large background. At Volio, our head writer was a comedian.
And he quickly picked up on… so I think what you need is this combination of being a good writer, but you also need to understand the limitations of the technology. Because I’ve seen cases where it’s been a great writer but they haven’t quite adapted to the fact that, you can’t just say anything like a movie script; it’ll break down.
It’s no your poem. (Laughs)
Yeah, exactly. But then I’ve seen other writers, like we had at Volio, where he very quickly was like got into the groove of what it meant to talk to a speech system. And it really… I mean, I can write very clear usable prompts. But he could take that and elevate it to a much more interesting, engaging fun interactive experience, and we absolutely need that… that talent pool to really assist with some of these things. Sometimes I’ll just hear a prompt from one of these systems and just cringe like, “Oh yeah, that was an engineer who wrote that.”
And sometimes… yeah, you’ll hear these great ones, you’re like, “Oh wow! Somebody really did it!”
It was Aaron Sorkin.
Aaron Sorkin, exactly. So I think it’s a… it’s an essential part of writing and assistance, you have to have strong writing skills. And they can come from a wide variety of backgrounds but that’s a really important part to make your assistant engaging and useful.
I was also curious like some… you know, some user interface elements trick us. For example, say checking Facebook or your email is designed in a way that it works like a slot machine. You have this infinite result and you can constantly update it and hope that somebody would make you win. You win with a nice status update, with a nice email that tells you that… I don’t know, whatever. Is there a danger to create tricks that are similar with natural language processing? And… because I was also realizing like, “Well, you could easily start to use tricks from neuro-linguistic programming to obtain and drive some kind of answers.” So do you… is there like also an ethical side that as a conversational designer you have to consider or you considered in the past for this? How does it work?
For sure. And I’m thinking about a couple things. One, I’m thinking about Sense.ly where… it’s this interesting thing because we knew that for certain patients, if they engaged in this daily check-in with the avatar, they would be probably more healthy. They would be more less likely to go back to the hospital; things like that. And so we had this very altruistic reason to make them want to come back every day and talk to the avatar. But at the same time, we have to walk that fine line because we don’t want to use gamification to unnecessarily trick, like you said, them into engaging too much. So it’s walking this… this careful line. Thinking about it for… for things like systems like echo and home, one of the reasons… it’s slightly off-topic. One of the reasons I really like these systems a lot in… in a way, is because they add to the conversation. You know, if you’re sitting at dinner and someone has something they want to look up and they look down at their phone, that person is gone; they’re no longer in your conversation. Whereas, if we’re at dinner and I turned… one of us turns and says, you know, “Alexa, who’s the richest person in the world?” we’ve all heard my question, we’ve all heard the answer, they’ve joined the conversation briefly and departed again. But it’s different than getting sucked into like, “Nah, I’m scrolling through my Twitter feed forever and ever.”
So I love that aspect. But to your other point about, there probably are these sort of, quote, ‘tricks’ we could use. And I’m thinking about things like the Alexa Prize, not that I think the Alexa Prize is a trick, but Alexa Prize is all about getting people to chat for as long as possible with the system; like make it engaging enough that somebody would want to chat.
So this is something that Amazon does. The Alexa Prize it’s like a challenge for developers. It’s like, “Hey, develop it…” and the goal is to have the user interact as much as you can make it to.
Right. So they had this Alexa Prize for a million dollars and was for universities. And the goal they set… the bar they set was to get to a 20-minute conversation and also to have… excuse me, to be able to rate… you got the opportunity to rate the system, and they wanted above 3.0.
And it was very interesting to play with these and see, “What were the ways they were trying to get people to keep talking?” since that was the goal. And a lot of times they would have to fall back into games or something like that, where they would say, “Do you like X or Y?” And you would say, “Do you like music more or movies more?” and if you said, “I like movies,” they would say, “Let me tell you about a movie,” and it would introduce another topic or just keep introducing topics. And the part where it struggled and fell down was, if you tried to sort of vary slightly often say, “Oh yeah, I like movies but I really want to talk about, you know, sports now.” and they’d be like, “Oh, let’s talk some more about movies.” And so it would sometimes get off track. But thinking similarly about… I don’t know if you’ve heard of Xiaoice, which is in China. It’s a chatbot that is extremely popular. And people often will chat, it’s text chat, for 20 minutes or so. And I heard an interesting statistic recently that’s… a lot of young people who use it and a lot of the topics the young people are talking about are, complaining about their parents. And I thought, “Is it really tricking these people into talking for 20 minutes? Is it bad that they’re talking for so long?” And I thought, “No, this is great! What a great outlet for a teenager,” let’s say, “to have a place, a safe space to vent about things they might be upset about.”
It’s like a therapist.
It’s like a therapist!
But it doesn’t really understand you.
Yeah. And what if you had that your whole life? Would that be good or bad if you had this AI assistant who grew up with you and knew all your things you hated and all the things you’re upset about and never judged you and always gave you a graded response when you said you had a bad day? Is that a good thing or a bad thing?
But, you know, isn’t it like throwing words in an empty box if there is not an original point of view on the other side that can make you grow and understand something different about yourself? Isn’t it like shouting in the wind?
So that’s… that’s the philosophical…
So there was a great study recently in Scientific American about, “What is the content of what people say?” And they decided… they determined, 60% of the time, we’re talking about ourselves.
And they also were putting people in an MRI machine and it turns out, people get a lot of pleasure when they’re speaking about themselves…
… even if no one is listening. So…
(Laughs). “Did you look at me at home in the morning in front of the mirror?”
(Laughs) So if I’m talking to my AI assistant and, like you said, there’s no… maybe it doesn’t have a strong point of view, maybe it’s soulless, whatever, is it still…? It’s probably still satisfying in a way.
It is; it is.
But is that, philosophically, not so great?
I’m not sure if it’s constructive. Yeah, exactly. It’s like, it is satisfying, I’m not sure it can be necessarily constructive. But it might be satisfying in the same way that you can start to write a page of diary and get out all you want.
So it can be like a record for that. I think that most projects start… even social media like Facebook, starts with the intention to connect people, right? And it ends up with the intention of getting as much data as they can about you and that becomes leveraging addiction for human beings. And that becomes having people just refreshing and steering and sharing even.
So I… I hope… I hope you will work on this and our people will be able to prevent these outcomes for this newly born revival. (Laughs)
So I think… I mean that’s… that’s one of the things… when I’m feeling very optimistic, that’s one of the things I feel optimist about voice which is, instead of turning to my phone and getting…
It simplifies it, right.
… I just do a quick one off or, you know, a short conversations with my device and I’m not sitting there for half an hour scrolling through Twitter.
So that would be such an advantage. I think that’s the power of it. You’re right.
And then another statistic recently that I thought was great was that, people said that half the time they’re using one of these voice assistants is with other people. And I thought, “That’s… that’s lovely!” you know?
“Maybe… maybe this new tech is a slightly… slight break to get us out of our obsession with screens.”
I would love… I hope so. I hope so. As long as like, you know, it becomes seamless and more integrated. As long as it’s not leverage. You said in your book, “Humanity yearns to communicate. When no one is around, like you were saying, we talked to our pets in our TVs; we want to talk to our computers too. Voice user interfaces can fundamentally change the way we interact with technology, making it so that we have to act less like computers and more like people.” I found… I found it, in your book, a beautiful contrast. I think I found a beautiful contrast in it. Because on one side you explained how it’s not the books goal and your goal to touch AIs. Artificial intelligence is reminding us how voice user interfaces and conversational interfaces aren’t just still user interfaces; it’s now artificial intelligence. But at the end with that… those remarks, you also refer to this poetic vision of humanity yearning to communicate and have conversations, even with machines, even with computers. And to have a truly bonding conversation, I believe we need empathy. We need… that’s… that’s an important part to me. We need awareness, we need attention and also, I believe, an original point of view. So to achieve that and not just make it merely satisfying, but to create companionship… if we create something we can talk to, do you need artificial intelligence to crack that part of a conversational user interface?
To reach the sort of true levels of like a true, true companion then that yes, we’ll have to crack that. But I think, to me, what is it going to mean when these devices have it feelings? Like right now, I can ask Alexa, “What’s your favorite beer?” and it will tell me a beer. And I was reading an article that said, “Oh yeah, that’s Alexa’s opinion. We didn’t program it… program that” I was like, “What do you mean?”
“What do you mean?” Yeah.
Because every time I ask one of these things their opinion, to me, it’s… it’s a pre… I know it’s a pre-programmed thing. And it’s kind of fun and interesting to say, “What’s your favorite color?” and get the little funny response. And my son loves asking questions like that.
But that’s just a joke.
But that’s just a joke. Do I want my… what would it even mean if my echo or my home had their own opinions? Do I want that? I mean, in a real companion I do. But in an assistant that I’m saying, you know, “Can you turn on the oven? And can you answer this question? Could you do that?” I haven’t really figured out yet what I… what I ultimately want from them. Do I..? Look we, were talking about if this AI system grows up with you as a companion your whole life and it becomes so easy to talk to it, you know it’s always going to say the right thing to make you feel better. Am I going to be less likely to call my friend or text my friend or call my family because it’s harder to talk to somebody who doesn’t always use the smooth way of talking back to you and things like that? Are we going to lose some of that because I’m just going to go to the easy one who’s going to tell me, “Oh, you’re doing fine; you’re okay,” and not be challenged by someone else? Are we going to get so far that I don’t even… I just send you a message via my assistant. I’m like, you know, “Tell so-and-so such-and-such for me,” and I don’t even talk to that person directly anymore. So it could go sort of… what… you know, what if my AI system is in a bad mood today and it won’t turn on the oven?
Like you think about Marvin the Paranoid Android from the Hitchhiker’s Guide to the Galaxy, which had a lot of personality and opinions and, you know, wasn’t a lot of fun to hang out with.
So there’s part of me, probably the child part of me that says, “I want it.”
Let’s keep them dumb.
And then there’s another part of me, the older more ethical thinking person, is like, “I don’t know.”
Let’s leave that to human beings.
Yeah. You know, “What does it mean? Do we want that?” I don’t know.
You know, we’ll check in again in a few. (Laughs).
We’ll see what… (Laughs).
VR assistants. You’re assistant will call my assistant.
Exactly. They’ll have an interview, they’ll have a podcast; their own podcast. And they’ll play it for us because we need to listen to it because they’re working on this and they deserve it.
I am… I’m curious, you know, which kind of person are you? Did you grow up religious?
So I… I grew up in a house that my… my mother went to church and we went to church like on…
Christian but I don’t even know the…
Oh, that’s alright.
But Easter and Christmas.
One of those, okay. Yeah, that kind of Church; the Easter and Christmas.
But I… it was never for me. I remember as a kid thinking, “Okay, someday I need to sit down and figure out if I believe in God or not.” And… and then I got to college and I was like, “Okay, I’m going to figure this out.” I thought I was going to have this like months thinking.
And then I was like, “No, I don’t believe in God. Okay, moving on.”
But for me, I… I find my spirituality in the universe and in the fact that… in people. I think people are so fascinating and amazing and there’s so many wonderful things that people do and create and think about that… that I find so much, you know, pleasure in that. And to me, the ultimate thing is to spend time with friends and family that I like and to try… I love trying out new cool technology. And I love… and computers driving me absolutely insane but I still just love them.
I can relate. (Laugh)
They drive me crazy every day, but I like them so much. And… and my… my sort of… I don’t if you can call it spiritual, but my spiritual goal in life is to sort of always be learning. I love… I love hanging around people who are curious and want to know stuff. And I encourage that in my son. Like it’s great to just learn about cool stuff and share that knowledge with people and talk about it. And my goal is to always be learning and my goal is to always try and look at myself and figure out, “How can I be a better person?” you know? I mean, I’m still… got a long way to go there but I just think…
That’s alright, we’re all terrible.
Exactly. So I just… I think there’s plenty, plenty there to drive us and to find meaning in life that way.
And you know what? If you can put 1% of this inside is, your job and that’s really important, that really gives hope.
That would be my goal, to look at things, this technology, in a way like, “How can we most, not… not just be cool, but how can we benefit people, and also, not harm people?” I think, unfortunately, a lot of times in tech, we think about the problem we’re trying to solve and, “Isn’t it cool?” and we don’t always think about the ways someone might misuse the technology or the ways it might hurt someone and you have to keep that in mind as it des… designer as well. Like, “What are all the ways someone might use this? And is that what we want to put out there in the world.”
Mm-hmm. You have one, kid right?
How does he relate to these assistants and Alexa?
Yeah, it’s… it’s so fun watching him interact with them for last few years. It was… to him it is so natural. Like, in homework the other day he’s like, “How do I spell architect?” and my husband and I are like, “Well look it up in the dictionary,” and he’s like, “I’ll just ask Alexa.”
And we’re like, “No!” and then we’re like, “Well, what’s the difference between looking it up in the dictionary and asking Alexa?”
I agree. Like in this specific case like, what is the difference?
As long as he… when… for me, handling… I always have a hard time handling infinity in my pockets and in these devices because it’s too much for me. My brain is not built to manage infinity. And so sometimes I realize that I tend to retain less information because it’s constantly available to me and it’s like this external memory that it’s in my pocket or inside servers.
Right. Maybe fewer facts but maybe more breadth of knowledge of the…
Yes, you’re right.
… problem solving. And that’s the positive view. But I was… I posted this on Twitter yesterday. I was thinking, you know, “We’re going to get to a point where, you know, you’ll think about something and they’ll pull that information from the cloud or whatever. And the difference between what I know and looking something up will be gone. Because I can look something up so quickly, there’s no difference in me knowing it and it being accessible to me.”
So at that point, it will be all the procedural, like we’ll have that process of thinking. And it will all be not as much as retaining information because at that point, that’s just storage.
And it will be more about how you form your brain, how you… you cultivate your brain.
That’s how I’d like to think about because I’m very bad at remembering things like dates and historical things. So I’m like, “Yeah, it’s because that knowledge is…”
Yes, yes! “Implant it into me now! I want it now!”
“I know Kung Fu.”
I know, right? What is one skill that throughout your career helped you to go through it to grow? What… that she would give us an advice to a young designer, a young engineer that is going through that?
I might say… curiosity. And by that I mean, whatever your job is, maybe when you start with an internship, or something very minor, to not just silo yourself in getting whatever you job needs to be done that day is. Like “Hey I’m done I have not to think about anything else”. Find out what your interests you, there’s a lot of stuff going on, some of it you’ll find interesting some you won’t. And the only way to find that out is by seeing what other people do, talking to them about what they do. I think it’s just really important to have curiosity within your company. “What are all the other groups doing? How does that impact what I am doing? How can I help them?”. That’s part of it too: always offer your help and make sure to make connections, not just in what you are working on.
So basically, like you said earlier, learning, and then also kindness. Alright. Cathy Pearl, thank you so much for being part of the show, it was a delightful experience for me.
Thank you so much for having me on.