Hickory Dickory Doc

Dear KV,

While reviewing some encryption code in our product, I came across an option that allowed for null encryption. This means the encryption could be turned on, but the data would never be encrypted or decrypted. It would always be stored “in the clear.” I removed the option from our latest source tree because I figured we did not want an unsuspecting user to turn on encryption but still have data stored in the clear. One of the other programmers on my team reviewed the potential change and blocked me from committing it, saying the null code could be used for testing. I disagreed with her, since I think the risk of accidentally using the code is more important than a simple test. Which of us is right?

NULL for Naught

Dear NULL,

I hope you are not surprised to hear me say that she who blocked your commit is right. I have written quite a bit about the importance of testing and I believe that crypto systems are critical enough to require extra attention. In fact, there is an important role that a null encryption option can play in testing a crypto system.

Most systems that work with cryptography are not single programs, but are actually frameworks into which differing cryptographic algorithms can be placed, either at build or runtime. Cryptographic algorithms are also well known for requiring a great deal of processor resources, so much so that specialized chips and CPU instructions have been produced to increase the speed of cryptographic operations. If you have a crypto framework and it does not have a null operation, one that takes little or no time to complete, how do you measure the overhead introduced by the framework itself? I understand that establishing a baseline measurement is not common practice in performance analysis, an understanding I have come to while banging my fist on my desk and screaming obscenities. I often think that programmers should not just be given offices instead of cubicles, but padded cells. Think of how much the company would save on medical bills if everyone had a cushioned wall to bang their heads against, instead of those cheap, pressboard desks that crack so easily.

Having a set of null crypto methods allows you and your team to test two parts of your system in near isolation. Make a change to the framework and you can determine if that has sped up or slowed down the framework overall. Add in a real set of cryptographic operations, and you will then be able to measure the effect the change has on the end user. You may be surprised to find that your change to the framework did not speed up the system overall, as it may be the overhead induced by the framework is quite small. But you cannot find this out if you remove the null crypto algorithm.

More broadly, any framework needs to be tested as much as it can be in the absence of the operations that are embedded within it. Comparing the performance of network sockets on a dedicated loopback interface, which removes all of the vagaries of hardware, can help establish a baseline showing the overhead of the network protocol code itself. A null disk can show the overhead present in file-system code. Replacing database calls with simple functions to throw away data and return static answers to queries will show you how much overhead there is in your Web and database framework.

Far too often we try to optimize systems without sufficiently breaking them down or separating out the parts. Complex systems give rise to complex measurements, and if you cannot reason about the constituent parts, you definitely cannot reason about the whole, and anyone who claims they can, is lying to you.

Dear KV,

What do you think of systems such as Doxygen that generate documentation from code? Can they replace handwritten documentation in a project?

Dickering with Docs

Dear Dickering,

I am not quite sure what you mean by “handwritten” documentation. Unless you have some sort of fancy mental interface to your computer that I have not yet heard of, any documentation, whether in code or elsewhere is handwritten or at least typed by hand. I believe what you are actually asking is if systems that can parse code and extract documentation are helpful, to which my answer is, “Yes, but …”

Any sort of documentation extraction system has to have something to work with to start. If you believe that extracting all of the function calls and parameters from a piece of code is sufficient to be called documentation, then you are dead wrong, but, unfortunately, you would not be alone in your beliefs. Alas, having beliefs in common with others does not make those beliefs right. What you will get from Doxygen on the typically, uncommented, code base is not even worth the term “API guide,” it is actually the equivalent of running a fancy grep over the code and piping that to a text formatting system such as TeX or troff.

For code to be considered documented there must be some set of expository words associated with it.

For code to be considered documented there must be some set of expository words associated with it. Function and variable names, descriptive as they might be, rarely explain the important concepts hiding in the code, such as, “What does this damnable thing actually do?” Many programmers claim their code is self-documenting, but, in point of fact, self-documented code is so rare that I am more hopeful of seeing a unicorn giving a ride to a manticore on the way to a bar. The claim of self-documenting code is simply a cover up for laziness. At this point, most programmers have nice keyboards and should be able to type at 40–60 words per minute, some of those words can easily be spared for actual documentation. It is not like we are typing on ancient line-printing terminals.

The advantage you get from a system like Doxygen is that it provides a consistent framework in which to write the documentation. Setting off the expository text from the code is simple and easy and this helps in encouraging people to comment their code. The next step is to convince people to ensure their code matches the comments. Stale comments are sometimes worse than none at all because they can misdirect you when looking for a bug in the code. “But it says it does X!,” is not what you want to hear yourself screaming after hours of staring at a piece of code and its concomitant comment.

Even with a semiautomatic documentation extraction system, you still need to write documentation, as an API guide is not a manual, even for the lowest level of software. How the API’s documentation comes together to form a total system and how it should and should not be used are two important features in good documentation and are the things that are lacking in the poorer kind. Once upon a time I worked for a company whose product was relatively low level and technical. We had automatic documentation extraction, which is a wonderful first step, but we also had an excellent documentation team. That team took the raw material extracted from the code and then extracted, sometimes gently and sometimes not so gently, the requisite information from the company’s developers so they could not only edit the API guide, but then write the relevant higher-level documentation that made the product actually usable to those who had not written it.

Yes, automatic documentation extraction is a benefit, but it is not the entire solution to the problem. Good documentation requires tools and processes that are followed rigorously in order to produce something of value both to those who produced it and to those who have to consume it.

Dear NULL,

Dear KV,

Dear Dickering,

Hickory Dickory Doc

DOI

August 2015 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Dear NULL,

Dear KV,

Dear Dickering,

Hickory Dickory Doc

DOI

August 2015 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.