Lessons Learned, or Not

I am adapting a recent program to take advantage of a more concise form of loop recently introduced into the language (see a short description on my other blog). An example of loop that I wanted to rewrite was:

                from
                    c.start
                until
                   c.after
                loop
                    print (c.item.name) ; if c.index < c.count then print (“, “) end
                    c.forth
                end

It prints out elements of a list (separated by commas) by advancing through the list manually: start to move the cursor to the beginning, forth to advance it, item to get to the element at cursor position, after to test if we are done. The replacement is much more concise:

                across c.item as el loop
                    print (el.item.name) ; if el.index < el.count then print (“, “) end
                end

The syntax lets us name a cursor el and uses it to traverse the structure automatically. Here it is a list, but the same code would work for any suitable data structure (hash table, integer interval and so on). An added advantage is that this automatically managed cursor is external to the list, rather than part of it as in the previous variant, so that you do not have to worry about multiple simultaneous iterations on the same structure. (You could have external cursors before, but the first variant above was the most common scheme.)

All this is great, except that having become a bit too cocksure after converting a number of loops I was now doing it quickly, directly replacing the text, and I forgot to remove the last instruction of the loop from the previous version: c.forth, which advances the cursor.

A few minutes later, as I ran the test suite (bored but out of a sense of duty since I knew I could not have done anything wrong), I immediately got an exception:

What happened? Of course I have a sophisticated debugger at my disposal (see the bottom and middle-right fields) and could have started sleuthing. No need to take that route, however: the tool told me that a precondition was violated, the precondition valid_position for the routine forth. Since the above screenshot may be hard to read here is the relevant top-right field, bigger:

The stack trace showed where this happened, and the left field on the left of the previous screenshot displayed the code. The error was obvious and I corrected it immediately. Usual stuff (see “just another day at the office”).

The reason why the mistake was caught right away as a precondition violation is that the precondition valid_position means in this case (as one readily sees by clicking on it) not after: you cannot advance the cursor if you are already on the last element. In the old version of the loop after is, correspondingly, the exit condition. But in the faulty new version every iteration would execute forth, so that the last time around we are at the wrong position. Now perhaps you think that this does not matter after all, since we do not do anything with the cursor; but hold on.

Once in a while I wonder how people manage when they do not have such mechanisms at their disposal; so just for fun I decided to put the faulty line back in, turn off contract monitoring, and re-run the test. I expected that I might get some kind of weird crash, but no: the execution proceeded smoothly. Only by looking at the results in detail did I notice that one of them at least was off: it read

whereas the correct result is

In other words, the result adds just a few elements to the “alias relation” (if you want to know what this example is about see this post and the draft article on the “alias calculus” to which it refers). The difference is sufficiently small that in the absence of automatic tools it might have remained undetected for a while, and I shudder at the thought of all the debugging effort that would then have been necessary.

Putting this quite ordinary example in the context of my work (on and off) on this particular program over the past couple of months, I can think of perhaps a hundred such mistakes; this is an estimate only — I do not track things carefully enough, Watts Humphrey would be ashamed of me — but I believe it is not too far from reality. Out of these, I had to go into a real debugging session about five times; not a pleasant experience, even with a great debugger, because I am dealing with delicate stuff, including loops that iterate until reaching a fixpoint (so that the slightest mistake can lead to non-termination). All the other cases, i.e. the vast majority, were caught as either type errors or contract violations.

Type errors are the nicest cases, because the next compilation detects them immediately; it is amazing how a good type system will catch subtle errors. Perhaps my current program is a bit special as I use complex data structures; many an error that would otherwise been hard to spot gets caught as the use of (say) a hash table whose keys are hash tables of lists of expressions made of variables, rather than a hash table whose keys are lists of hash tables of variables and expressions. Still, I keep wondering how I would do with less typing. There is a lot of fuss right now around “Dynamic languages”, and I am all for dynamism, lots of it, but static type checking is not something I will ever renounce.

Unlike with type mismatches, detecting contract violations currently still requires execution, but in practice the process is almost as effective. As in the example I described, what happens is that you end up with software that has so many built-in consistency conditions, automatically monitored during development, that if you made a mistake somewhere it will almost always trigger a violation of one of these conditions — even if the mistake and the condition it violates are in separate parts of the code.

This method of quickly getting software right works best when you do not have to write all the contracts yourself but rely, as I do, on a powerful machinery taking advantage of contracts: libraries carefully built with the support of preconditions, postconditions, class invariants and check instructions. In the example, the mistake was in my code, but was caught through a precondition in a library routine. This is a daily experience for people who have the benefit of such machinery.

Now it is possible to draw a simple conclusion from all this: that I am a sloppy programmer. Maybe. If everything you write is perfect the first time, the regrettable possibility exists that you just wasted your time reading this entry all the way to its last word.