Dear KV,
I posted a question on a mailing list recently about a networking problem and was asked if I had a tcpdump. The person who responded to my question—and to the whole list as well—seemed to think my lack of networking knowledge was some kind of affront to him. His response was pretty much a personal attack: If I couldn’t be bothered to do the most basic types of debugging on my own, then I shouldn’t expect much help from the list. Aside from the personal attack, what did he mean by this?
Dumped
Dear Dumped,
It is always interesting to me that when people study computer programming or software engineering they are taught to use the creative tools—editors to create code, compilers to take that code and turn it into an executable—but are rarely, if ever, taught how to debug a program. Debuggers are powerful tools, and once you learn to use one you become a far more productive programmer because, face it, putting printf()
—or its immoral equivalent—throughout your code is a really annoying way to find bugs. In many cases, especially those related to timing issues, adding print
statements just leads to erroneous results. If the number of people who actually learn how to debug a program during their studies is small, the number who learn how to debug a networking problem is minuscule. I actually don’t know anyone who was ever directly taught how to debug a networking problem.
Some people—the lucky ones—are eventually led to the program you mention, tcpdump, or its graphical equivalent, wireshark, but I’ve never seen anyone try to teach people to use these tools. One of the nice things about tcpdump and wireshark is that they’re multi-platform, running on both Unix-like operating systems and Windows. In fact, writing a packet-capture program is relatively easy, as long as the operating system you’re working with gives you the ability to tap into the networking code or driver at a low enough level to sniff packets.
Those of us who spend our days banging our heads against networking problems eventually learn how to use these tools, sort of in the way that early humans learned to cook flesh. Let’s just say that though the results may have been edible, they were not winning any Michelin stars.
Using a packet-capture tool is, to a networking person, somewhat like using a thermometer is to a parent. It is likely that if you ever felt sick when you were a child at least one of your parents would take your temperature. If they took you to the doctor, the doctor would also take your temperature. I once had my temperature taken for a broken ankle—crazy, yes, but that doctor gave the best prescriptions, so I just smiled blithely and let him have his fun. That aside, taking a child’s temperature is the first thing on a parent’s checklist for the question “Is my child sick?” What on earth does this have to do with capturing packets?
By far the best tool for determining what is wrong with programs that use a network, or even the network itself, is the tcpdump tool. Why is that? Surely in the now 40-plus years since packets were first transmitted across the original ARPANET we have developed some better tools. The fact is we have not. When something in the network breaks, you want to be able to see the messages at as many layers as possible.
The other key component in debugging network problems is understanding the timing of what happens, which a good packet-capture program also records. Networks are perhaps the most nondeterministic components of any complex computing system. Finding out who did what to whom and when (another question parents often ask, usually after a fight among siblings) is extremely important.
All network protocols, and the programs that use them, have some sort of ordering that is important to their functioning. Did a message go missing? Did two or more messages arrive out of order at the destination? All of these questions can potentially be answered by using a packet sniffer to record network traffic, but only if you use it!
It’s also important to record the network traffic as soon as you see the problem. Because of their nondeterministic nature, networks give rise to the worst types of timing bugs. Perhaps the bug happens only every so many hours, because of a rollover in a large counter; you really want to start recording the network traffic before the bug occurs, not after, because it may be many hours until the condition comes up again.
Using a packet-capture tool is, to a networking person, somewhat like using a thermometer is to a parent.
So, here are some very basic recommendations on using a packet sniffer in debugging a network problem. First, get permission (yes, it really is KV giving you this advice). People get cranky if you record their network traffic, such as instant messages, email, and banking transactions, and then post it to a mailing list. Just because some person in IT was dumb enough to give you root or admin rights on your desktop does not mean you should just record everything and send it off.
Next, record only as much information as you need to debug the problem. If you’re new at this you’ll probably have the program suck up every packet so you don’t miss anything, but that’s problematic for two reasons: the first is the previously mentioned privacy issue; and the second is that if you record too much data, finding the bug will be like finding a needle in a haystack—only you’ve never seen a haystack that big. Recording an hour of Ethernet traffic on your LAN can capture a few hundred million packets. No matter how good a tool you have, it’s going to do a much better job at finding a bug if you narrow down the search.
If you do record a lot of data, don’t try to share it all as one huge chunk. See how these points follow each other? Most packet-capture programs have options to say, “Once the capture file is full, close it and start a new one.” Limiting files to one megabyte is a nice start.
Finally, do not record your data on a network file system. There is no better way to ruin a whole set of packet-capture files than by having them capture themselves.
So there you have it: a brief introduction to capturing data so you can debug a networking problem. Perhaps now you can get yelled at on a mailing list for something more egregious than not taking your network’s temperature before calling the doctor.
KV
Related articles
on queue.acm.org
Debugging in an Asynchronous World
Michael Donat
http://queue.acm.org/detail.cfm?id=945134
Kode Vicious Bugs Out
Kode Vicious
http://queue.acm.org/detail.cfm?id=1127862
A Conversation with Steve Bourne, Eric Allman, and Bryan Cantrill
http://queue.acm.org/detail.cfm?id=1413258
Dedication
I would like to dedicate this column to my first editor, Mrs. B. Neville-Neil, who passed away after a sudden illness on December 9th, 2009; she was 65 years old.
My mother took language, both written and spoken, very seriously. The last thing I wanted to hear upon showing her an essay I was writing for school was, “Bring me the red pen.” In those days I did not have a computer; all my assignments were written longhand or on a typewriter and so the red pen meant a total rewrite. She was a tough editor, but it was impossible to question the quality of her work or the passion that she brought to the writing process. All of the things Strunk and White have taught others throughout the years my mother taught me, on her own, with the benefit of only a high school education and a voracious appetite for reading.
It is, in large part, due to my mother’s influence that I am a writer today. It is also due to her influence that I review articles, books, and code on paper, using a red pen. Her edits and her unswerving belief that I could always improve are, already, keenly missed.
—George Vernon Neville-Neil III
Join the Discussion (0)
Become a Member or Sign In to Post a Comment