Computing Applications

What Do ChatGPT and AI-based Automatic Program Generation Mean for the Future of Software

Bertrand Meyer

Since the release of the ChatGPT interactive AI assistant it has been surprising to see some of the snide, passive-aggressive reactions from some (not all) members of the software engineering community, in the style of  “it’s just inference from bad data”. Let’s get real, folks, it is truly game-changing. The kind of thing that you witness once in a generation. (The last two times were object-oriented programming and the World-Wide Web.)

Basically, if you need a program element and can describe that need, the assistant will generate it for you. There is no particular restriction on the programming language that you choose, as long as its description and enough examples are available somewhere. The code will be pretty good. (More on the semantics of “pretty” below.) You can ask the assistant for a test suite and various other adornments.

Programmers no longer needed?

Trying this tool seriously is guaranteed to produce a  “Wow” effect and for a software engineer or software engineering educator, as the immediately following step, a shock: “Do I still have a job?“. At first sight, you don’t. Especially if you are a programmer, there is not much that you can do and ChatGPT cannot.

In assessing this observation, it is important to separate the essential from the auxiliary. Any beta release of a new technology is bound to suffer from a few pimples. Instructive in this respect is a look at some of the early reviews of the iPhone (for example those on CNET and on PCMag), lamenting such horrible deficiencies as the lack of Bluetooth stereo. I could complain that the generated code will not compile out-of-the-box, since ChatGPT believes that Eiffel has a “do” keyword for loops (it’s loop) and enumerated types introduced by “type” (it doesn’t). These bugs do not matter; the tool will learn. What does matter is that if I ask, for example, for a Levenshtein edit distance program in Eiffel, I get something that is essentially right. Plus well-formatted, equipped at the start of every routine (per good Eiffel style rules) with a header comment explaining clearly and correctly the purpose of the routine, and producing the right results. Far beyond the Turing test. (To be more precise: as readers of this blog undoubtedly know, a tool passes the Turing test if a typical user would not be able to determine whether answers come from a human or a program. In this case, actually, you will need to add a delay to the responses of ChatGPT to have it pass the test, since no human could conceivably blurt out such impressive answers in a few seconds.)

What comes after the bedazzlement? The natural question is: “What can I do with this?”. The answer — for a programmer, for a manager — is not so clear. The problem is that ChatGPT, in spite of its cocky self-assurance (This is your result! It will work! No ifs and buts!) gives you, for a non-trivial problem, an answer that may work but may also almost work. I am no longer talking here about growing pains or bugs that will be fixed, but about essential limitations.

Exact, or almost exact?

Here is an example that illustrates the phenomenon vividly.

In discussion of use cases and other requirements techniques, I like to use the example of a function that starts with explicit values: 0 for 0, 1 for 1, 4 for 2, 9 for 3, 16 for 4, 25 for 5. At this point almost everyone (and ChatGPT) will say sure, you don’t need to go on, I get it: the square function. As a specification technique (and that was my point in an earlier article in this blog, already 10 years ago, A Fundamental Duality of Software Engineering), this approach is terrible: an example or any number of examples do not provide a specification; I had some fun, in preparing that article, running the values through a curve-fitting algorithm that provided several other reasonable matching functions, along with a few unreasonable ones.

This time I fed the above values to ChatGPT and for good measure added that the result for 6 is 35. Yes, 35, not a typo. Here is the start of the iteration.

Now, lo and behold, ChatGPT still infers the square root function!

Obligingly adding instructions of how to use the function and examples of results (including 36 for 6!).

It does not stop there. The tool is truly an assistant, to which (one has to resist writing “whom”) you can talk:

It will correct itself, but by resorting to the kind of case-by-case programming reminiscent (as my colleague Jean-Michel Bruel pointed out) of the code that an an undergraduate student will enthusiastically produce just after discovering TDD: