Voice command is here, gesture command is improving (and is the future) but still…

A good friend of mine recently posted about the future of computing and that he felt voice control was the next big thing. I both agree and disagree with him. Back in the day when we were working together he and I would go rough and round about topics. I miss those days. He is brilliant and always made my thinking more clear and concise.

clip_image002One of the things he mentioned as a vision of voice control future was the Amazon Echo. I was invited to be an early tester of the product so we’ve had one in the kitchen for about 3 months. The voice control is solid. I knew that from having a Fire TV device in the living room as well. I scroll pictures on the big screen when I want to review what the kids have scanned or added to our Family History project. They have added voice control features to Echo that make it quite intriguing. First the kids use it every morning “Alexa what is the weather.” I am adding a second one in my office because Alexa (Echo) can now read Audible books.

So I agree with my friend. Voice is here now and is improving now. It is the control solution for the next 8-10 months. Beyond that horizon voice will remain forever, but will become the second option clip_image004eventually. Motion based control is the future. Motion does not require a clean room. By clean room by the way I am not talking about scientists in lab coats taking apart a hard drive. I am talking about the ambient noise level in the room. Voice control is the quick way to operate an Xbox One. But the noise around the Kinect sensor will determine the success of the command. Echo is the same and so is Siri. Ambient noise is the game changer for voice control.

The other part of voice control is the other things you use your voice for. You talk to other people and you talk to pets, and you talk on the telephone. All of those moments make it harder to use voice control. Add to that the noise around you and voice control has a limited impact. Of course you can apply the same thing to gestures. There are times when voice control and gesture/motion control will not work. You don’t want to be telling your phone to do something while in a crowded subway. For that matter you don’t want to be making motions while on a subway either. There remains a time for keyboard control.

clip_image006Devices as they move forward will enable multiple use interfaces. The reality of touch, voice, motion and keyboard inputs will remain available to you. Texting someone during a meeting when you should be paying attention you don’t want to suddenly say out loud “Siri text Jim this meeting is boring call me with a fake call so I can leave.” That is something you type using the on screen keyboard. Actually if the meeting is so boring that well you see the picture there are times you might use voice command.

It becomes a viability question. What format of command is viable for where I am? Let me choose. Your choice of input type determined by your reading of the environment. Outside in a hurricane and probably not using voice command. Probably also not using motion command. Probably using your voice to call for help and run to the nearest shelter. You get the idea. The right voice at the right time. Where voice is your input modality.

To end this and come back to my point of agreement with my friends post, is the reality of voice command. I’ve been using voice command (Dragon Naturally Speaking and Siri) for more than 10 years (well 2 years with Siri). The quality of the systems has improved immeasurably in that time. It’s no longer a gimmick or a way cool future tech option. It works now. For years those of us trying out and considering voice command options were told wait for the hardware to catch up. Then the hardware caught up and it was wait for the software. Now the only limiter is wait until you are in a fairly quiet place and you can use voice command at will.


Scott Andersen

IASA Fellow)