Recently had a discussion with a friend about where voice technology for general consumers is heading. Gary Vaynerchuk has consistently stated that “Voice is the Next Frontier.” Frankly, I would have to disagree.
Voice assistants are clunky at best. While I have not had much experience with Google Home, I have an Amazon Echo Dot sitting in my room and the Google Assistant on my phone. After 9 months with Amazon Alexa, I believe that voice is clunky, slow, and inaccurate.
Voice Is Clunky
Unless you are comfortable with an always listening assistant, which would have to understand context and parse through your conversations, voice will always have to be activated by a keyword or prompt.
“ALEXA, SET THE ALARM”
“HEY GOOGLE, SET THE ALARM”
To me, that is a really poor user experience. What if you have a list of 4 tasks to be done? That involves screaming a keyword 4 times in a row.
While voice is good at those one-off tasks, you really hit a barrier when you are trying to perform several tasks in a row or when the task is complex.
Voice Is Slow
What takes longer? Typing “What is the capital of Canada?” or saying “What is the capital of Canada?”
Wait. They actually take close to the same amount of time. I guess voice is pretty good for informational queries like that.
But beyond those simple commands and queries, what can you really do in a reasonable amount of time?
Are you going to write your next essay? Dictation services on the Mac have been around for a while, but people still choose to type. One of the reasons is because it is faster to type than to speak.
Unfortunately, there are many tasks that require detailed context, instructions, inputs, and conditionals that voice is unable to handle easily. What happens in these situations is that the user will be lead slowly down a decision making tree until the voice program has all the information.
What makes this experience worse is if you accidentally give the wrong information or the speaker fails to interpret your voice correctly. Then you have to cancel the flow and start over again.
No one wants to be asked 5 questions to accomplish one task. The time spent doing this is better used elsewhere. While I can schedule events using my Alexa, this is ultimately why I still choose to type the details into Google Calendar via my phone or laptop.
To me, this puts a limit on the ecosystem of voice applications which can be built. Consumers (at least for now) will only choose to stay engaged with tasks that can be invoked with quick one-liners. That’s only enough information in this short statement to stay within the realm of simple tasks.
Voice Is Inaccurate / Just Doesn’t Understand
Spotify has an amazing feature called the Daily Mix. It analyzes past songs preferences and pumps out a playlist that I always fall in love with.
“Alexa, play my daily mix on Spotify”
“I can’t find daily mix on Spotify”
This sucks. A human would have been able to understand, or at least ask a clarifying question. Instead, I’m just disappointed. Now I want to play some uplifting music.
“Alexa, play God’s Plan”
“God’s Plan by Daniel O’donnell”
Come on. There is only one “God’s Plan” Alexa should be playing. The chart-topping hit by Drake.
The ability to understand context for a seamless experience is sorely lacking, especially on Alexa.
Google Assistant was able to successfully play “God’s Plan” by Drake when asked. However, when asked to play my daily mix…
While I’m sure voice technology will continue advance in its ability to comprehend context, that really only solves the third problem of being unable to understand context properly. It will still be slow. It will still require some clunky keyword.
What I am personally much more excited about is the advent and perfection of Augmented Reality. I think a gesture-based AR system be faster, smoother, more intuitive, and much more fun than voice can ever be. At it’s best, I think voice would be a complementary technology to AR, but not something worth proclaiming a revolution over.