Speech Recognition has been around a long time by technology standards however up until about 2010 most of it was spent languishing in Gartner’s wonderfully named “Trough of Disillusionment”. This was partly because the technology hadn’t matured enough and people were frustrated and disappointed when it didn’t live up to expectations, a common phenomenon identified by the previously alluded to Hype Cycle. There are a couple of reasons why Speech Recognition took so long to mature. It’s a notoriously difficult technical feat that requires sophisticated AI and significant processing power to achieve consistently accurate results. The advances in processing power were easy enough to predict thanks to Moore’s Law. Progress in the area of AI was a different story entirely. Speech Recognition relies first on pattern recognition, but that only takes it so far. To improve the accuracy of speech recognition improvements in the broader area of natural language processing were needed. Thanks to the availability of massive amounts of data via the World Wide Web, much of it coming from services like YouTube we have seen significant advances in recent years. However there is also human aspect to the slow uptake of speech driven user interfaces, people just weren’t ready to talk to computers. 2016 is the year that started to change.
Siri (Apple) who was first on the scene and is now 5 years old and getting smarter all the time came to MacOS and AppleTV this year. Cortana (Microsoft) who started on Windows Phone, then to the desktop with Windows 10, made her way onto Xbox One, Android and iOS and is soon to be embodied in all manner of devices according to reports. Unlike Siri, Cortana is a much more sociable personal digital assistant, willing to work and play with anyone. By this I mean Microsoft have made it much easier for Cortana to interact with other apps and services and will be launching the Cortana Skills Kit early next year. As we’ve seen in the past it’s this kind of openness and interoperability that takes technologies in directions not envisaged and often leads to adaption and adoption as personal AT. If there was a personal digital assistant of the year award however, Amazon Echo and Alexa would get it for 2016. Like Microsoft, Amazon have made their Alexa service easy for developers to interact with and many manufacturers of Smart Home products have jumped at the opportunity. It is the glowing reviews from all quarters however that makes the Amazon Echo stand out (a self-proclaimed New Yorker Luddite to the geeks at CNET). Last but not least we have Google. What Google’s personal digital assistant lacks in personality (no name?) it makes up for with stunning natural language capabilities and an eerie knack of knowing what you want before you do. Called Google Now on smartphones (or just Google App? I’m confused!), similar functionality without some of the context relevance is available through Voice Search in Chrome. They also offer voice to text in Google Docs which this year has been much improved with the addition of a range of editing commands. There is also the new Voice Access feature for Android currently in beta testing but more on that later. In the hotly contested area of the Smart Home Google also have a direct competitor to Amazons Echo in their Google Home smart speaker. Google are a strong player in this area, my only difficulty (and it is an actual difficulty) is saying “ok Google”, rather than rolling off the tip of my tongue it kind of catches at the back requiring me to use muscles normally reserved for sucking polo mints. Even though more often than not I mangle this trigger phrase it always works and that’s impressive. So who is missing? There is one organisation conspicuous by their absence with the resources in terms of money, user data and technology who are already positioned in that “personal” space. Facebook would rival Google in the amount of data they have at their disposal from a decade of video, audio and text, the raw materials for natural language processing. If we add to this what Facebook knows about each of its users; what they like, their family, friends and relationships (all the things they like), calendar, history, interests… you get more than a Personal Digital Assistant, maybe Omnipersonal Digital Assistant would be more accurate. The video below which was only released today (21/12/16) is of course meant as a joke (there are any number of things I could add here but I’ll leave it to the Guardian). All things considered however it’s only a matter of time before we see something coming out of Facebook in this area and it will probably take things to the next level (just don’t expect it to be funny).
What does this all mean for AT? At the most basic level Speech Recognition provides an alternative to the keyboard/mouse/touchscreen method of accessing a computer or mobile device and the more robust and reliable it is the more efficiently it can be used. It is now a viable alternative and this will make a massive difference to the section of our community who have the ability to use the voice but perhaps for any number of reasons cannot use other access methods. Language translation can be accurately automated, even in real time like the translation feature Skype launched this year. At the very least this kind of technology could provide real-time subtitling but the potential is even greater. It’s not just voice access that is benefiting from these advances however, Personal Digital Assistants can be interacted with using text also. Speech Recognition is only a part of the broader area on Natural Language Processing. Advances in this area lead directly to fewer clicks and less menu navigation. Microsoft have used this to great effect in their new “Tell me what you want to do” feature in their Office range. Rather than looking through help files or searching through menus you just type what tool you are looking for, in your own words, and it serves it right up!
Natural Language Processing will also provide faster and more accurate results to web searches because there is a better understanding of actual content rather than a reliance on keywords. In a similar way we are seeing this technology working to provide increased literacy supports as the computer will be able to better understand what you mean from what you type. Large blocks of text can be summerised, alternative phrasing can be suggested to increase text clarity. Again the new Editor feature in Microsoft Word is made possible by this level of natural language understanding.