Voice is what makes artificial intelligence come alive, says the writer James Vlahos . It is an aspect of technology that "stirs the imagination", one of which has been part of the stories and science fiction for a long time. And now, argues Vlahos, he's ready to change everything.
Vlahos is the author of Talk to Me: How voice computing will transform the way we live, work and think (Houghton Mifflin Harcourt). It is already the case that attendees at home can speak and show personality, and as this technology develops, it will bring a lot of questions that we have not considered before.
The Verge spoke with Vlahos about the science of voice computing, which people will benefit most and what this means for the power of Big Tech.
This interview has been slightly edited for clarity.
What exactly happens when you talk to a device like Alexa and it responds?
If you're just used to talking to Siri or Alexa and you say something and you hear something back, it seems that a process is taking place. But you really should think of it as multiple things, each of which is complex to achieve.
First, the sound waves of your voice have to be converted into words, so that it is an automatic voice recognition, or ASR. Those words must be interpreted by the computer to discover the meaning, and that is NLU, or natural language comprehension. If the meaning has been understood in some way, then the computer must find out something to answer, so it is NLG, or natural language generation. Once this response has been formulated, there is speech synthesis, so it is taking words inside a computer and converting them back into sound.
Each of these things is very difficult. It is not as simple as the computer looking for a word in a dictionary and solving things. The computer has to get some things about how the world works and people to be able to respond.
Are there really interesting developments in this area that aroused your curiosity?
A lot of really interesting work is being done in the generation of natural language where neural networks are creating original things for the computer to say. Not only are they capturing prescribed words, but they do so after receiving training in large volumes of human speech: subtitles of films and Reddit threads and so on. They are learning the style of how people communicate and the kinds of things that person B could say after person A. So, since the computer is creative to a certain extent, that caught my attention.
What is the ultimate goal of this? How will it look when voice computing is ubiquitous?
The great opportunity is for the computers and phones we are using now to really disappear in their primacy and importance in our technological lives, and for computers to disappear. You need information and want to do something, just talk and the computers make your offer.
That's a big change. We have always been tool makers and tool users. There are always things that we hold or grasp or touch or slide. So, when you imagine that everything fades and your computing power is effectively invisible because we are talking about small microphones integrated into the environment that are connected to the cloud, that is a profound change.
A second big problem is that we are starting to have relationships with computers. People like their phones, but you do not treat them like a person, per se. We are in the era in which we began to treat computers as beings. They show emotions to a certain extent and have personalities. They have troubles, we look for them through companionship. These are new types of things you do not expect from your toaster oven, microwave or smart phone.
Who could benefit most from the increase in voice aids? The elderly are a group we often hear about, especially because they may have poor eyesight and find it easier to talk. Who else?
Elders and children are really the guinea pigs for voice computing and AI personified. Older people often have the problem of being alone, so they are the most likely to converse with Alexa. There are also applications out there where artificial voice intelligence is used almost as a babysitter, to give medication reminders or to allow family members to do remote controls.
However, and not too generalized, some older people have dementia and it is a bit harder to recognize that the computer is not really alive. Similarly, for children, their understanding of reality is not so firm, so they may be more willing to commit to these personified AIs as if they were really alive in some way. You also see voice AIs that are used as virtual nannies, like, I'm not at home but the AI can monitor. That is not happening totally yet, but it seems to be about to happen in some way.
What will happen when we have virtual nannies and all that and all the technology vanishes in the background?
The dark scenario is that we look for less human company because we can turn to our digital friends instead. Data is already being sent to Amazon that people are turning to Alexa to talk about company, chat and chat.
But you can do it positively and, sometimes, I do it. It is good that we are making the machines more similar to people. Like it or not, we spent a lot of time in front of our computer. If that interaction becomes more natural and less about pointing, clicking and sliding, then we are moving in the direction of being more authentic and human, against having to become almost machines when we interact with the devices.
And I think we're going to give more centralized authority to Big Tech. Especially when it comes to something like Internet search, we're less likely to browse, find the information we want, synthesize it, open magazines, open books, what whatever we do to get information instead of just asking questions in our voice. . It is really convenient to be able to do that, but we also give more confidence and authority to a company like Google to tell us what is true.
How different is that scenario from the current concern about "false news" and misinformation?
With voice assistants, it is not practical or desirable that, when asked a question, give the verbal equivalent of 10 blue links. So Google has to choose what answer it will give you. Right there, they are gaining enormous guardian power to select what information is presented, and history has shown that if the control of information is consolidated into a single entity, that is rarely good for democracy.
At this time, the conversation is very focused on false news. With voice assistants, we are going to deviate in a different direction. Google will have to really focus on not presenting [fake news]. If you only present an answer, it is better that it is not garbage. I think the conversation is going to be more towards censorship. Why can they choose what is considered a fact?
How much should we worry about privacy and the types of analysis that can be done with the voice ?
I am also concerned about the privacy implications I have with smartphones in general. If the technology companies are abusing that access to my home, they can do it with my computer, because they can do it with Alexa sitting on the other side of the room,
That's not at all to minimize privacy concerns . I think they are very, very real. I think it's unfair to point out that voice devices are worse. Although there is a feeling that we are using them in different environments, in the kitchen and in the living room.
Changing a little bit of subject matter, your book spends some time talking about the personalities of several voice aides. How important is it for companies that their products have personality?
Personality is important. That is definitely key, otherwise, why the voice at all? If you want pure efficiency, you could be better off with a phone or a desktop computer. What has not happened yet is the differentiation between Cortana, Alexa, Siri. We are not seeing technology companies design very different personalities with an idea to capture different sectors of the market. They are not doing what cable TV or Netflix do where you have all these different programs that are cutting and cutting the consumer landscape.
My prediction is that we will do that in the future At this time, Google, Amazon and Apple just want the majority of people to feel grateful, so they are quite broad, but [I think they will develop] the technology is that my assistant It is not the same as your assistant is not the same as your fellow Worker Assistant. I think they will because it would be attractive. With every other product in our lives we do not have a one size fits all, so I do not see why we would do it with voice aids.
However, there are some pitfalls there, as we see in discussions around why attendees tend to have female voices . Is there more of that in the store?
We are already looking at questions about gender issues. There has been very little conversation on the subject of race or the perceived race of virtual assistants, but I have the feeling that this conversation is approaching. It is funny. When you press the big technology companies on this topic, with the exception of Amazon who admits that Alexa is a woman, everyone else says "it's an AI, it does not have a gender". That will not prevent people from perceiving clues about what kind of gender or race identity you are going to have.
All this to say, Big Tech will have to be very careful to negotiate those waters. They may want to specialize a bit more, but they can get into dangerous waters where they do something that sounds like cultural appropriation, or something that is out of place, or stereotyped.