Dear Diary

Dear KV,

I recently joined a medical company as a data scientist—crunching numbers on its latest drugs—and one thing I have noticed is that biologists and other non-computer people I work with keep notebooks. Every single experiment or idea is carefully recorded in a physical notebook that is then checked by someone they work with. The process seems laborious, but I am told it is required for their work. I cannot remember ever having to record experiments for my computer science courses in college, admittedly more than a few years ago, but I know you have written about keeping a debug log. Is that the same thing?

Noted

Dear Noted,

Congratulations on the new job: Yes, you have a good memory. I have written about using a debug log to figure out problems and apply the scientific method to debugging. While a debug log is helpful, it is not the same thing as a laboratory notebook. Did you know that the first actual bug was a moth found in a relay of the Harvard Mark II computer in the late 1940s by doctor (and, eventually, Rear Admiral) Grace Murray Hopper? She not only found the bug, but also taped it into the debug log. (You can find the log entry and image online.) If only all our bugs were large enough to see!

I have to say I find it surprising a data scientist would not know about laboratory notebooks. In the physical sciences such as physics and chemistry, as well as in the medical sciences, such notebooks are required. In fact, undergraduates in those classes must turn them in for grading as part of their studies. Most lab notebooks remain on paper, although there are also expensive online systems for keeping lab notes. The online systems are meant to protect companies from predatory patent trolls and to provide a digital way of proving their invention, whatever that may be, was first.

We in computer science can learn from the long-standing processes in other sciences, and we should get ourselves up to date with the 18^th century. Maintaining a proper lab notebook has been a thing in the non-computer sciences for several hundred years, and it’s time computer science earned its science badge by going back to the future.

Most people who spend their days in front of computers would probably balk at keeping a paper notebook of experiments instead of a more convenient electronic form. Let’s see how we could keep an electronic laboratory notebook without paying through the nose.

Laboratory notebooks are different from other notebooks and logs in a few ways. A laboratory notebook should contain a series of experiments and experimental notes that work out a hypothesis. The hypothesis doesn’t need to be grand: “Light is both a wave and a particle, and, therefore, gravitation will divert the path of light.” It can be much like the debug log entries mentioned previously, or it can be a hypothesis about a particular change to the code: “Substituting a Fortran mathematical routine in this set of calculations for its C++ equivalent will reduce the time of the computation by a factor of 2.” The hypothesis is something that can be tested and should be stated simply enough that anyone else conversant in our scientific endeavor can understand it.

Each experiment should have a title and an unambiguous date. KV prefers Day – Month Name – Year, since even a non-English speaker can figure out that the thing in the middle is not a numerical day or a year. KV also keeps a state variable associated with the entry: Is it not started or is it in progress (lots of head banging on the desk trying to build and run the experiment)? Once complete, it can be marked COMPLETE or FAILED. More experiments FAIL than COMPLETE for KV, but maybe you are better than that. The goal is not to have a perfect record of COMPLETEs, but to have that one experiment that satisfies the hypothesis.

Once you have your hypothesis, you have to test it, but how? The next section of the notebook lays out the procedures, calculations, equipment, software libraries, and everything that is required for the experiment. The How section should include everything about the experiment you can think of. Many experimenters have failed because they didn’t include everything they used, and so they could not tell that, for example, running the experiment with different pieces of software on the same system would affect the outcome, or even mask an effect they were looking for.

Write down everything you can think of, and then ask a colleague if you have left out anything. In the physical sciences, this type of pair verification is common.

Now it is time to run your experiment and record your observations, which is your next section. Much as you did in the How section, record everything you see (hear, taste, smell—whatever senses you can bring to bear on the problem will later serve you well when you do the analysis).

Note that observations are just that: Just observe, be as one with the experiment, be the observer. Later, you can analyze.

Eventually the observations will, ideally, complete, subject to the Turing limit, and it will be time to analyze the observations. This is where you summarize everything you thought you observed and try to discover if you have proven or disproven your hypothesis. You remember the hypothesis, right? It is at the top of the entry! Go reread it. Did the work you did prove or disprove anything? If you had false assumptions or observations, cross them out (there are convenient fonts for this now) but leave them in the entry.

Finally, the experiment may have given you ideas for further experiments. As that great Irish scientist and comedian Dara Ó Briain said: “Science knows it doesn’t know everything; otherwise, it’d stop.” Write down what you might want to figure out next.

To be presumptuous, I have included a template page from one of my own lab notebooks in this column. It should be used as a starting point and not as gospel, because the gospel according to KV would contain a lot of blaspheming.

Recommended resources

For an excellent introduction to keeping a lab notebook, check out Howard M. Kanare’s Writing the Laboratory Notebook from 1985.
Here is the github repo for notebooks that I set up.
And here is the Org Mode Notebook file.

Sidebar: Example Notebook Entry

1 Sep 2023 Getting the time in user space :in_progress:

HYPOTHESIS

What are you trying to show? Be specific about what you are measuring: time, transactions, size of data, latency, etc.

HOW

(Procedures, calculations, equipment)

The details of How
Equipment::CPU, memory, disk, network features, and base performance characteristics
Software::Name, version, options
Scaffolding scripts, executables::Location, name, command arguments passed
Commands typed::Use the script(1) command before you start any interactive session that will run test or other measurement code. This command captures all commands typed and their output for later use.

OBSERVATIONS

Describe all that happens (planned or unplanned) during the experiment in narrative text.
Raw experimental data::Tables and other data that are small enough to fit directly into the notes may be placed in this section.
Large output files::Point to the files, signed and stored in a repo, that contain any relevant output.

Org mode has a way to strike out text, C-c C-x C-f +, which inserts plus-mark characters that make a strikeout as in the following line:

+- This idea did not work out and therefore is struck out .+

In a laboratory notebook we NEVER remove ideas; we leave them and strike them out. Yes, we can recover text via the version-control system but that is not sufficient for our purpose. We need to have the strikeouts remain to work as a real lab notebook.

DATA ANALYSIS

Processing of raw data, graphs, interpretations

Summarize your interpretation of the results of the experiment in narrative text.
Summary tables, graphs, images, etc. may be placed directly into the notes.
Large tables, graphs, images, etc.::Point to the files, signed and stored in a repo, that contain any relevant output.