Your Wish Is My CMD

As artificial intelligence (AI) techniques advance, they are beginning to automate tasks that, until recently, only humans could perform—tasks such as translating text from one language to another or making medical diagnoses. It seems only logical to turn that computer power on computers themselves and use AI to automate programming.

In fact, computer scientists are working on just that idea, using various AI techniques to develop new methods of automating the writing of code. “The ultimate goal of this is that you would have professional software engineers not actually write code anymore,” says Chris Jermaine, a professor of computer science at Rice University in Houston, TX. Instead, the engineer would tell a computer what a piece of software should do, and the AI system would write the code, perhaps stopping along the way to pose questions to the engineer. “A software engineer becomes much more of a designer than somebody who deals with the low-level details,” Jermaine says.

Such a vision, Jermaine and other computer scientists say, lies decades in the future, and it is not entirely clear how to achieve it. Meanwhile, researchers are applying AI techniques to narrower problems and coming up with some promising solutions.

A Vintage Idea

The concept of program synthesis, in which a user specifies an intention and a programming language and the machine creates the code, actually dates back to the early days of AI in the 1950s, says Swarat Chaudhuri, an associate professor of computer science at the University of Texas, Austin. These days, most people think of statistical methods such as neural networks when they talk of AI, but back then, the field was focused on symbolic descriptions, he says. Theorem provers, which use computer programs to automatically come up with formal mathematical proofs, exemplify that type of AI.

Combining both the symbolic and statistical approaches to AI can help to solve the challenge of program synthesis, Chaudhuri says. Say, for instance, that you want to write a program to read a file. You might start by providing a neural network with some keywords, such as “read” and “file.” The neural net could then go through a corpus of thousands of programs, perhaps as collected on GitHub, a Microsoft-owned repository of code. The neural net could identify the type of program structure associated with the keywords, providing a skeleton of what the desired program should look like.

Machine learning, though, cannot accomplish the whole task. “Neural nets are actually really bad at doing things precisely. They are definitely not able to do tasks like programming end to end,” Chaudhuri says. “Going from that high-level structural insight to a piece of code that’s going to pass a type checker, that you can paste into your code window and it’s not going to complain, that’s a leap.”

The next step is to use symbolic methods to fill in the low-level details, such as which variable to use in a particular place within the code, by searching through all the possible variables that could be placed there. Chaudhuri, then at Rice, and Jermaine developed a system that uses these methods to figure out the specifications of a program. The system, called PlinyCompute, was funded starting in 2014 with a four-year, $11-million grant from the U.S. Defense Advanced Projects Research Agency.

While something like PlinyCompute can identify a localized pattern such as the section of a program responsible for reading a file, nothing is yet capable of looking at the overall structure of more elaborate programs and discovering the patterns of how such smaller tasks fit together. PlinyCompute is not able to write codes longer than 50 or 60 lines. “Using machine learning to look at these kind of meta-level patterns is one thing that people just really haven’t looked at,” Jermaine says. “No one is really looking at it because it just seems so hard.”

Program synthesis is successful when it is limited to small problems in tightly defined domains, says Marc Brockschmidt, a researcher in Microsoft Research’s Programming Principles and Tools group in Cambridge, U.K. The difficulty lies in going to a more ambitious scale, because of the challenge of specifying what the programmer wants. The most common ways to tell the computer the desired outcome are either to use natural language or to show it a set of examples and ask it to learn from them. “The problem really is that both natural language and examples are a very weak way of specifying what behavior you want,” Brockschmidt says. “You wouldn’t expect that any system, no matter how far we get in research, would be able to go from ‘write an operating system’ to produce something like Windows 10 or macOS, just because when I say ‘write an operating system,’ there’s a lot of assumptions that I have and a lot of different ways of implementing this task that are not captured by my description.”

In 2017, Brockschmidt and his colleagues at Microsoft Research developed a program called DeepCoder, which performed program synthesis by having a neural network learn from a series of examples of the output that would be expected for a given input. DeepCoder required the use of a domain-specific programming language, which contains more restrictions than a full-featured programming language. They applied their approach to challenges posed on programming competition websites and found they were able to solve some of them better than other approaches could. DeepCoder only worked on the simplest challenges, however, and Brockschmidt decided to pursue other approaches to program synthesis.

AI has begun to find its way into commercial tools for software development. So far, the most widespread use of machine learning in the software industry is for code autocompletion, Brockschmidt says. For instance, Microsoft’s integrated development environment for programmers, Visual Studio, now includes IntelliCode. IntelliCode scans GitHub to identify patterns in coding, and uses what it learns to provide suggestions as the programmer types in statements. It also suggests arguments—values that are passed between programs—and tries to infer the formatting style being used, to keep the code consistent.

Eclipse, the integrated development environment for Java, also uses AI to make autocompletion suggestions, and the startup Kite does the same for Python. Another startup, DeepCode, spun out of Swiss technical university ETH Zurich, applies machine learning to reviewing software once it has been written, in order to uncover security bugs. A beta version of the company’s software is available for code developed in Visual Studio.

A Sparsity of Data

One difficulty in teaching a machine to program is a lack of data. While there is plenty of existing code collected in GitHub, or in the in-house collections of companies such as Google, very little of it has labels describing the developer’s intention. There may be a few keywords or some textual notes, but that is uncommon and often of limited value. “Often what the user wanted when they wrote the particular piece of code is not very well documented,” says Armando Solar-Lezama, head of the computer-assisted programming group at the Massachusetts Institute of Technology, Cambridge, MA. With no way to know the intention behind existing code, the computer cannot predict how to go from what a developer asks for to new code. If a programmer has to spend a lot of time and effort writing a formal specification of what the program should do, program synthesis loses a lot of its value.

Divining a user’s intention is one key aspect of automating programming, says Justin Gottschlich, head of machine programming research at Intel Labs in Santa Clara, CA. Intel established the research program last fall to encourage the automation of programming. Gottschlich and Solar-Lezama were two of the authors of a 2018 paper describing what they call the three pillars of machine programming. The first pillar, intention, is the ability of the machine to understand the programmer’s goals. Invention is the ability of the computer to write a program that accomplishes those goals. The third pillar, adaptation, is about revising the software to make it more efficient and to correct errors.

Gottschlich considers the complete automation of writing software one of the field’s grand challenges, one that could take decades to achieve. “You basically give the computer an intention specified in some manner—input/output examples, natural language, whatever—and then it builds the entire piece of software for you. That is an outrageous goal,” he says.

Yet there are smaller aspects of the problem where machine programming already is outperforming humans, such as in generating tests to find performance bugs in software. Bugs that degrade the efficiency of a program can be hard to spot because they are not black-and-white errors. A program with performance bugs may still run, but much slower than you want. Intel developed a program called AutoPerf, based on some of the techniques of machine programming, and was able to detect a bug in the MySQL relational database management software that was degrading its performance by nearly 70%.

In fact, one of the benefits of applying AI to writing software should be the reduction of errors, thereby increasing efficiency and cutting development costs. It can also help with a shortage of programmers. According to a 2017 survey by Code.org, a non-profit that promotes computer education, the U.S. had more than 500,000 unfilled jobs for coders, but was producing only 50,000 computer science graduates a year.

Rather than take jobs away from people, automating software creation could free programmers to focus on the more creative parts of their jobs. A machine programming system could act as an assistant to a program designer, taking care of the nitty gritty and querying the designer about exactly what he wants. “What you could have is a magnification effect where people are able to produce more and better software,” says Jermaine. “And I think it would alleviate some of those terrible problems that we have right now with the lack of engineering capacity in the modern world.”

Gottschlich says AI could even open up the power of programming to people who have no training in writing code. “We really want to enable the global population to be what I’m calling ‘software creators’,” he says. “If we realize this dream that we’re setting out to conquer, the machines would do all the programing and the humans would focus mostly on intention.”

Further Reading

Gottschlich, J., Solar-Lezama, A., Tatbul, N., Carbin, C., Rinard, M., Barzilay, R., Amarasinghe, S., Tenenbaum, J.B., and Mattson, T.
The Three Pillars of Machine Programming, Proceedings of ACM MAPL 2018, https://dl.acm.org/doi/10.1145/3211346.3211355

Balog, M., Gaunt, A.L., Brockschmidt, M., Nowozin, S., and Tarlow, D.
DeepCoder: Learning to Write Programs, 5th International Conference of Learning Interpretations, 2017, arXiv:1611.01989v2

Zou, J., Barnett, R.M., Lorido-Botran, T., Luo, S., Monroy, C., Sikdar, S., Teymourian, K., Yuan, B., and Jermaine, C.
PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development, SIGMOD ’18: Proceedings of the 2018 International Conference on Management of Data, doi/10.1145/3183713.3196933

Kant, N.
Recent Advances in Neural Program Synthesis, ArXiv, 2018, arXiv:1802.02353v1

Machine Programming: What Lies Ahead https://knowledge.wharton.upenn.edu/article/ai-machine-learning/