When Tavis Rudd decided to build a system that would allow him to write computer code using his voice, he was driven by necessity.
In 2010, he tore his rotator cuffwhile rock-climbing, forcing him to quit climbing while the injury healed. Rather than sitting idle, he poured more of his energy into his work as a self-employed computer programmer. "I'd get in the zone and just go for hours," he says. Whether it was the increased time pounding away at a keyboard or the lack of other exercise, Rudd eventually developed a repetitive strain injury (RSI) that caused his outer fingers to go numb and cold, leaving him unable to type/code without pain.
Worried that he would not be able to do his job, Rudd turned to Dragon Naturally Speaking voice recognition software to see if that could help. He quickly discovered that he could insert commands into Dragon using the programming language Python, and that he could use the Python-based application programming interface (API) Dragonfly to create lists of words and link them to specific actions he wanted Dragon to perform.
So he set about creating such a list, known as a grammar, of words that would cause a text editor such as Emacs to take certain actions—insert or delete characters, add a bracket, move the cursor up some number of lines. He created this grammar with strange words, such as ak or par, to avoid confusing the speech recognition software with common English words and to keep the number of syllables per command down to one or two, so programming this way would be speedy.
It took two or three months to develop a customized grammar, learn how to work with it and what it was capable of, and figure out just which bits of automation he needed. He ended up with a Python file that was about 2,000 lines long and contained about 1,500 commands. Though the end result works well, he says, it is not a program that someone could just download and start using; he calls it a bit "ducttapey." "I think if someone wanted to get the basics, it would take them maybe three weeks to a month to get going, but they'd also need to know how to interface that with their editor."
Rudd's RSI has cleared up, and his job has changed to include less coding and more management; he's now principal engineer at Unbounce, a digital marketing agency in Vancouver, Canada. Because of those changes, he no longer uses his system and has not updated it in about five years. He had planned to publish the code, but realized it would take too much time to clean it up.
Indeed, several voice coding projects have followed a similar trajectory. Programmers discover that they have too much pain to type, so they build their own idiosyncratic voice coding system, based on Dragon or Windows Speech Recognition (WSR). They share it with a wider community but never develop it to an easy-to-use, out-of-the-box solution, then move onto other things after a while.
That was the case with Gustav Wengel, a programmer in Denmark who co-founded two startups: the software consultancy Bambuu, and Reccoon, which is attempting to give consumers a way to digitally track how much waste they generate. In 2016, Wengel developed pain and tingling in his little finger that ran all the way up to his elbow, a complaint often called "Emacs pinkie," though a doctor would probably diagnose it as cubital tunnel syndrome, in which the ulnar nerve that runs along the arm is compressed or stretched.
He tried switching to an ergonomic keyboard, which reduced the pain but didn't end it, then tried voice coding with Dragon. His plan was to build the skeleton of a voice control system by keyboard, then continue developing it with his voice. But after a couple of months, the pain began to diminish. "My hands became better and better, and the voice coding just wasn't worth it anymore," he says.
Voice coding software, Wengel says, is difficult to develop, difficult to learn, and difficult to use. "A lot of people use it out of a sense of desperation because that's all they've got," he says.
There are several challenges to creating voice coding software. One is the need to rely on voice recognition engines, which are not optimized for the task. Dragon, owned by Nuance Communications of Burlington, MA, is primarily focused on transcribing speech, based on models of natural language built up through years of machine learning. That does not really address the needs of programmers. "When you're coding, you're not speaking in sentences." Rudd says. "You don't have a language model that you can leverage."
Furthermore, there's a lot more to programming than simply dictating code. Programmers have to be able to move around within the code, manipulate multiple windows, go to different Web pages, and test and debug their code. "Really, writing code hands-free is a subset of the problem of generally controlling a computer hands-free," says Rick Mohr, a software developer for Azavea in Philadelphia, PA, which creates geospatial Web applications. Mohr created Vocola, a voice coding program he still maintains, after he developed RSI in 2007. Vocola comes in two versions, one using Dragon and the other using Windows Speech Recognition.
Desktop computers were designed to take input from a keyboard and a mouse, and accessibility was an after-thought, Mohr says. He argues that voice is actually a superior way of controlling computers, if only they were designed to make that easy. Speaking, for instance, is much faster than typing, and there are quicker ways to move a cursor than with a mouse. "Anybody who has been forced to climb the learning curve to use a computer hands-free discovers there are many things that are actually more efficient than using their hands," he says. "Mostly the only people that have been willing to climb the curve are those that need to."
Many interfaces require users to click on a button with a mouse, and doing that by voice is difficult, says Ryan Hileman, a software engineer in Mountain View, CA, who designed and is continuing to develop the Mac-based voice coding system Talon. To overcome the mouse problem, he incorporates an eye tracker and noise recognition. The eye tracker allows a user to move a cursor across the screen very quickly, though it does not provide pixel-level accuracy because eyes tend to flicker. A head tracker lets the user refine the location of the cursor, and making a popping sound with his mouth provides the click. Hileman also can drag something across the screen by letting out a long hiss, "which is kind of silly, but it works," he says.
Hileman has been working on Talon full-time since quitting his job in August 2017. He's supporting himself through savings, and collects around $1,000 per month through the website Patreon to fund his development of Talon. He gives away the software for free, but will not release the code as open source so he can maintain control of how it is developed. He wants to create a plug-in system that will allow users to create their own commands and have them mesh with those that already exist.
There are other voice coding projects in existence, such as Aenea, which runs Dragon in a virtual machine, and Caster, a collection of tools that run on top of Dragonfly. One older project is voice-code.io, a Mac-based coding platform developed by Ben Meyer, which sold for $300. "He hasn't supported it in years," Hileman says. "You can pick it up and get started and try to use it, but it seems a little janky to me. It doesn't seem polished at all." Another system, also called Voicecode, was launched by the National Research Council of Canada in 1999, but went defunct several years ago.
One fear these developers share is that the underlying software their programs rely on will stop working. The platforms are built on top of Dragon or WSR. The programmers gain access to Dragon through a backdoor interface, NatLink, installed by Dragon's original developers, but Nuance does not support it. Quintijn Hoogenboom, a Dutch software developer, and other enthusiasts try to maintain NatLink. "It gets slightly more crippled with every release of Dragon, but people have always found a way to keep it going," says Mohr, adding there may come a day when an update breaks it irrevocably.
There's a lot more to coding than simply dictating code. Programmers need to move around within the code, manipulate multiple windows, access Web pages, and debug their code.
WSR was developed as part of Windows Vista in the early 2000s, and while it still works, Mohr says there is no guarantee that either it or Natlink will continue. "Either of those could disappear at any time."
Rudd believes coding by voice will become much easier as coding methods in general evolve. "The type of coding that our industry has been doing has been quite low-level. We've been very syntax-focused," he says.
With advances in artificial intelligence and natural language processing, Rudd thinks programming will become less mechanistic. Instead of telling a computer, line by line, how to achieve a result, a programmer will tell it what he/she wants to accomplish, and the machine will search through libraries of functions to find the best way to obtain that result. "I think when that happens, the voice systems that are available for Google Now and Siri and that sort of stuff will be much more suitable for coding in that style," Rudd says.
However programming evolves, Hileman says, it is important that voice be an option for coders. "I need it. It's very important to me," he says. "I can't type sustainably, so if I want to be able to use computers—which I'm very passionate about—I need something like this."
Hauthorn, C., Reinhold, J., Wengel, G., and Roswall, S.
Using Consumer Electronics to Enhance the Experience of Developing Software With Additional Input Modalities; Aarhus Universitet Bachelor Project 2016
State of Voice Coding – 2017, https://medium.com/bambuu/state-of-voice-coding-2017-3d2ff41c5015
Speaking in Code: How to Program By Voice, Nature, 559, 2018, https://www.nature.com/articles/d41586-018-05588-x
Using Python to Code By Voice, https://www.youtube.com/watch?v=8SkdfdXWYaI
©2019 ACM 0001-0782/19/05
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2019 ACM, Inc.
No entries found