Voice assistance in 2019

Abstract The end of the calendar year always seems like a good time to pause for breath and reflect on what’s been happening over the last 12 months, and that’s as true in the world of commercial NLP as it is in any other domain. In particular, 2019 has been a busy year for voice assistance, thanks to the focus placed on this area by all the major technology players. So, we take this opportunity to review a number of key themes that have defined recent developments in the commercialization of voice technology.


Introduction
For just over a year, I've been curating This Week in NLP, a weekly newsletter which highlights the key events and happenings in the world of commercial NLP. a On reviewing a wide range of sources over the last year, what I've found most striking is the significant proportion of news coverage that focuses on voice assistance. Striking, but not really surprising: the ubiquity of voice assistants means they are the most directly accessible NLP technology for the majority of end users. In this post, I reflect on what's made the news over the last year and draw out what I see as the major themes that have defined developments in voice in 2019.

The battle of the giants: Amazon Alexa versus Google Assistant
If you're looking to buy a smart speaker, it's very likely you'll be making a choice between an Amazon Echo and a Google Home. Although there are a number of other players in the marketand we'll get to those further below-at this point in time the main contenders are devices powered by Amazon's Alexa, which celebrated its fifth birthday in November 2019, and Google's Assistant, which turned three in October.
Between these two, who's winning the war for your voice bandwidth and all it might reveal depends on what you count. Apparently more than 100 million Alexa-compatible devices have been sold; b Alexa has been integrated with 85,000 smart home products; c and the Alexa app store hosts over 100,000 skills. d Google Assistant, on the other hand, is said to be on over 1 billion individual devices; e it works on over 10,000 smart home products; and, at the beginning of the year, it supported just over 4000 actions. f a You can sign up at https://www.language-technology.com/twin. b https://www.theverge.com/2019/1/4/18168565/amazon-alexa-devices-how-many-sold-number-100-million-dave-limp c https://www.zdnet.com/article/amazons-new-alexa-features-echo-devices-check-privacy-smart-home-integration-boxes/ d https://voicebot.ai/2019/10/01/amazon-alexa-has-100k-skills-but-momentum-slows-globally-here-is-the-breakdown-bycountry/ e https://techcrunch.com/2019/01/07/google-says-assistant-will-be-on-a-billion-devices-by-the-end-of-the-month/ f https://voicebot.ai/2019/02/15/google-assistant-actions-total-4253-in-january-2019-up-2-5x-in-past-year-but-7-5-thetotal-number-alexa-skills-in-u-s/ Of course, device count isn't the same as active user count: just about every Android device, and those are mostly phones, has Google Assistant preinstalled, but it's hard to find numbers for how many people actually use it. Product count isn't overly helpful either: a lot of those smart home products might be differently branded but otherwise identical light bulbs. And Amazon has made it so easy to develop for Alexa (more on that below) that a fair proportion of those 100,000 skills are likely to have (at most) one user, being the result of a developer kicking the tyres.
Smart speaker sales numbers might be a better indicator of which virtual assistant is gaining more traction, since the only reason you'd buy a smart speaker is because you actually want to talk to it. Here, for 2019Q3 at least, Amazon is way ahead, with 10.4 million sales g (that's a stunning 36.6% of smart speaker sales worldwide)-three times more than Google at 3.5 million. And those numbers for Amazon are up 5% on the same quarter in the previous year, so its lead appears to be increasing.
At the end of the day, though, you just want these things to answer your questions. So where do the various voice assistants stand in terms of actual performance? Twice a year, Loup Ventures runs what you might think of as an IQ test for virtual assistants, h asking Google, Siri, and Alexa 800 questions each on a variety of topics. In the most recent round, all three have improved (substantially, in Alexa's case), but Google is still on top, answering only 57 questions (7%) incorrectly, whereas Alexa got 162 (20%) wrong.
Perhaps in response to this deficit, Amazon has introduced Alexa Answers, i a kind of ''Audible Quora'' where users can provide answers for questions that leave Alexa stumped; Alexa then delivers these answers preceded by the message that the answer is ''according to an Amazon customer''. The feature has generated some criticism, particulary in regard to Amazon's apparent lack of quality control. j

Other voice assistants
Of course, there are a number of other players in the voice assistant space. In fact, for 2019Q3, Chinese manufacturers, about whom relatively little is heard in the West, had a good showing saleswise: both Alibaba and Baidu narrowly outsold Google in smart speakers, shipping 3.9 and 3.7 million units, respectively, and Xiaomi just trailed Google at 3.4 million units.
Meanwhile, other voice assistants have been struggling. Microsoft's Cortana has had an uncertain year: introduced in 2014 to compete with Siri as a digital assistant for the now-dead Windows Phone, by the beginning of 2019 it had become a key feature of the Windows 10 interface; then around mid-year it became a separate app in the Windows store; and by year end it was being folded into Outlook and Office 365 k as a productivity aid. Microsoft is one of around 30 companies that have signed up to Amazon's Voice Interoperability Initiative, l the aim of which is to allow multiple voice assistants to comfortably exist on the same device. This looks like a clear recognition that Cortana will exist alongside other voice assistants m rather than compete with them.
Samsung's Bixby, first introduced in 2017, has also had a hard time. Despite the company selling over 500 million ''Bixby-compatible'' devices each year, it struggles to be heard above the chatter created by all the others. This hasn't stopped Samsung rolling out a number of initiatives in g https://voicebot.ai/2019/11/15/amazon-sold-three-times-more-smart-speakers-than-google-in-q3-2019-baidu-andalibaba-also-beat-google-device-sales/ h https://loupventures.com/annual-digital-assistant-iq-test/ i https://alexaanswers.amazon.com/about j https://venturebeat.com/2019/11/01/probeat-alexa-answers-devalues-amazons-virtual-assistant/ k https://www.onmsft.com/news/leaked-microsoft-video-shows-how-cortana-will-integrate-with-windows-10-and-outlook l https://developer.amazon.com/alexa/voice-interoperability m https://venturebeat.com/2019/11/05/microsoft-is-banking-cortanas-success-on-the-idea-of-a-multi-assistant-world/ an attempt to gain greater acceptance in the market. Recognizing the success of Amazon's strategy of providing lots of support for third-party developers, n this year Samsung has announced Bixby Developer Studio, which lets third-party developers create skills (known as ''capsules'' in Bixbyspeak); a third-party app marketplace where developers can sell those apps; Bixby Views, which lets you build voice apps for visual devices from TVs to watches; and Bixby DevJam, a developer contest for new Bixby capsules, with prizes totalling US$125,000. The company aims to put AI in every device and appliance it makes by 2020. o Siri, the voice assistant that started it all back in 2011, has generated relatively little news in 2019. And where's Facebook in all of this, you might ask? Around mid-year, Mark Zuckerberg announced that Facebook would soon be launching a number of voice-controlled products, p but nothing has appeared yet. The widely noted product placement of a Facebook Portal q in September's season 11 premiere of Modern Family-''Hey Portal, call Dad'' was the second line in the episode-doesn't really count, since the Portal uses Alexa.

Conversational ability
Inevitably, all this competition has contributed to a steady stream of advances in the technology underlying voice assistants. The big news in 2018 was Google Duplex, a hyper-realistic appointment-making voice dialog app that the company demo'd at Google I/O, r and piloted in New York, Atlanta, Phoenix, and San Francisco toward the end of that year.
This year saw the progressive rolling out of that technology: s by mid-year, Duplex was available for restaurant bookings in 43 US states, and a New Zealand trial t was mooted by the end of the year. In May, The New York Times reported that Duplex calls were often still made by human operators u at call centers: around a quarter of calls start with a live human voice, and of the calls that start with machines, 15% required a human to intervene. Some degree of human fallback is a sensible strategy when rolling out a service as groundbreaking as this, but it's unclear to what extent Duplex has become more self-confident over time.
Duplex's scary realism provoked concerns that people wouldn't know whether they were talking to a human or a machine. By mid-2019, California had passed a law requiring chatbots to disclose that they're not human; v Duplex now begins a call by identifying itself as being from Google.

Tools for building voice apps
The sophistication of Duplex's performance makes the average Alexa skill or Google action seem trivial: the complexity of the conversation in the Google Duplex demo is in an entirely different league from what happens when you ask your Google Home what the weather is like. As a small step toward narrowing the gap, all the major vendors have introduced features in their developer platforms that enable extended conversations, making it possible to go beyond simple one-shot question-and-answer dialogs. In 2018, Amazon introduced Follow-up mode, and Google responded with Continued Conversation, z features that cause the platforms to listen for additional queries or follow-up questions after an initial exchange, so that you don't have to keep saying the wake word. Baidu's DuerOS acquired the same capability this year, aa and Xiaomi's version is on the way too. ab Taking the next step, Amazon this year introduced Alexa Conversations, ac a set of tools that lets you build multi-turn dialogs that can interconnect with other Alexa skills. The feature is based on machine learning of what skills are used in close proximity to each other (e.g., booking cinema tickets then organizing an Uber to get to the cinema), and then automatically invoking the appropriate skill for the next step in the dialog, providing a kind of data-driven mixed initiative where the application is increasingly able to predict the user's next requirement.
On another front, both Google and Amazon announced features that support personalization in applications: Google's Personal References ad remembers data specific to you and your family, and Alexa skill personalization ae leverages voice profiles so that developers can provide personalized experiences, greetings, and prompts for recognized users.
The year also saw developments oriented toward making it easier to build these increasingly complex applications. Amazon has always seemed to care more about this than Google; from quite early on in Alexa's development, Amazon realized the importance of making it easy for outside developers to plug their apps into Alexa's code and to plug Alexa's code into all kinds of third-party devices. Taking this approach a step further, this year Amazon introduced new business-focused no-code Alexa Skills Blueprints, af which facilitate the development of a variety of common application types. In March, Voicebot reported that, since the launch of the feature a few months earlier, over 25% of the 4000+ new skills published in the Alexa skills store were Blueprintbased; but it turns out that more than two million Blueprint-based skills had been developed ag just for private family use.

Speech synthesis
A number of incremental improvements in speech recognition performance were announced by various vendors throughout the year, but these are not necessarily obvious to the end user. Much more visible (audible?) are improvements in speech synthesis.
During the year, Amazon announced the general availability of its newscaster style in the Amazon Polly text-to-speech service, ak which provides pretty impressive intonation: it's worth checking out the sample in this piece at VentureBeat. al And in the US at least, you can now ask Alexa to speak slower or faster. am Google's Cloud Text-to-Speech has also seen improvements: it now supports 187 voices, an 95 of which are neural-network-based WaveNet voices, covering 33 languages and dialects.
If you've ever been just a bit uncomfortable that the current crop of digital assistants all have female voices, you may be interested to know that a Danish team has created a voice called Q that's designed to be perceived as neither male or female; ao a demo of the genderless voice ap is available online. Or you could switch to using Cortana, which is acquiring a masculine voice aq as one of its newest set of features.
Other use cases for speech synthesis have appeared during the year. Amazon has developed technology that mimics shifts in tempo, pitch, and volume ar from a given source voice. Facebook's MelNet closely mimics the voices of famous people; you can check out its rendering of Bill Gates at VentureBeat. as In fact, Amazon has decided that celebrity voices are going to be a thing. at First cab off the ranks is Samuel L. Jackson: embodying his voice in your Echo device will cost you $4.99, complete with profanities.

Privacy
For AI applications in general, 2019 was the year in which a number of ethics issues came to the fore. For voice, the big issue of the year was data privacy. There's been a concern about listening devices ever since we first invited them into our homes, but our early fears were allayed by vendor insistence that our smart speakers only ever started listening once the appropriate wake word was uttered. That there might be other concerns was first hinted at toward the end of 2016, when the Arkansas police asked Amazon to hand over Echo voice recordings ax that might provide evidence in a murder case. And then in 2018, Alexa recorded a couple's conversation in their home ay and sent it to a random person from their contact list. How could this happen? According to Amazon:

Echo woke up due to a word in background conversation sounding like ''Alexa'' Then, the subsequent conversation was heard as a ''send message'' request. At which point, Alexa said out loud ''To whom?'' At which point, the background conversation was interpreted as a name in the customers contact list. Alexa then asked out loud, ''[contact name], right?'' Alexa then interpreted background conversation as ''right''.
Fast forward to the April 2019, when Bloomberg reported that Amazon was using third-party contractors to transcribe and annotate voice recordings. az That humans transcribe voice data for training and evaluation purposes won't come as a surprise to anyone who knows anything about the industry, but it caused quite a stir in the mainstream press. Very quickly every voice platform was in the spotlight: ba Apple, Google, Microsoft, Samsung and, thanks to voice recording in Messenger, Facebook. bb The general tone of the reporting suggested there were two regards in which the conduct was considered egregious: not only were humans listening to conversations that users thought they were having privately with their voice assistants, the human listeners in question were not even direct employees of the companies concerned, so who knows where your data might end up ...
The platform vendors quickly went into damage limitation mode in response to these events, assuring users that they could opt out of data collection, and that they could delete data that had been collected. Amazon even added a feature that allows users to delete their voice recordings by saying ''Alexa, delete what I just said'' or ''Alexa, delete everything I said today''. Apple's Tim Cook gave the commencement address at Stanford bc in June, in which he emphasized strongly the need for data privacy-a not-so-subtle reminder that, unlike Google and Amazon, Apple doesn't have an independent rationale for collecting your data.
The lesson from all of this is clear: transparency is important, so that users have a clear understanding of how and when their data are being used.
One response to concerns about privacy violations arising from your utterances being spirited up to the cloud is on-device processing. There are other good reasons for edge computing when aw https://www.forbes.com/sites/jessedamiani/2019/09/03/a-voice-deepfake-was-used-to-scam-a-ceo-out-of-243000 ax https://fortune.com/2016/12/27/amazon-echo-murder/ ay https://www.zdnet.com/article/alexas-latest-creepy-move-recording-a-couples-private-conversation-and-sharing-it/ az https://www.bloomberg.com/news/articles/2019-04-10/is-anyone-listening-to-you-on-alexa-a-global-team-reviewsaudio ba https://venturebeat.com/2019/04/15/how-amazon-apple-google-microsoft-and-samsung-treat-your-voice-data/ bb https://venturebeat.com/2019/08/13/facebook-paid-contractors-to-transcribe-users-audio-from-messenger bc https://news.stanford.edu/2019/06/16/remarks-tim-cook-2019-stanford-commencement/ it comes to voice processing, but developers of small footprint stand-alone voice processors like Sensory bd and Picovoice be were quick to emphasize the data privacy benefits of keeping it local. Google also announced a new faster version of Google Assistant, bf thanks in part to an on-device language model, although the announcement doesn't acknowledge privacy as a benefit. On a side note, the ''wake word defence'' may have its limits: as it happens, Amazon has a patent that would allow your Echo to make use of what you say before the wake word, bg rather than just after it, although the company stresses that the tech is not currently in use. And in July, a Belgian news organization reported listening to over a thousand leaked Google Assistant recordings, bh of which around 150 were not activated by the wake word.

Voice ubiquity
So it's been quite a year, and it seems like voice is almost everywhere. Just to reinforce that thought, here are 12 new things you can do using your voice that appeared during 2019.
(1) Talk to more machines in your kitchen: Gourmia introduced an air fryer, a crock pot, and a coffee maker bi that can be controlled by Google Assistant and Amazon's Alexa; Instant Brands announced that its Instant Pot Smart WiFi pressure cooker bj supports Google Assistant; and of course there's the the new Amazon Smart Oven. bk (2) Or, if you can't be bothered cooking, talk to the drive-thru: today to order breakfast bl at Good Times Burger & Frozen Custard in Denver, and also soon at McDonald's, who have acquired Apprente, bm a Bay Area voice recognition startup, with the aim of using it to take orders at drive-through windows.