Sign In

Communications of the ACM

Practice

Postmortem Debugging in Dynamic Environments


View as: Print Mobile App ACM Digital Library In the Digital Edition Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook
bug illustration

Credit: Gary Neill

Many modern dynamic languages lack tools for understanding complex failures.

The full text of this article is premium content


Comments


Berin Babcock-McConnell

Thank you for the interesting article. However, I have one comment / question regarding the broken code example. The article states that "[t]his simple program has a fatal flaw: while trying to clear each item in the ptrs array at lines 1415, it clears an extra element before the array (where ii = 1)." I agree that the out of bounds access and write on the ptrs array is not good. However, wouldn't dereferencing and writing to an uninitialized pointer be more of a problem here?

I admit I am not familiar with Illumos and it is not clear to me what hardware this example was run on but it seems like writing to below the ptrs array with the negative index is probably just going to write to an area of the stack that isn't currently in use (ptrs, ii, pushed registers, previous stack frame, etc, all being above ptrs[-1]). However, since the stack isn't initialized *(ptrs[ii]) is going to access whatever address happens to be in memory at ptrs[ii] and *(ptrs[ii]) = 0; is going to try and write a 0 to that address. Wouldn't this attempt to write to a random location in memory be more fatal to the program's execution?


David Pacheco

Berin, thanks for commenting. Yes, this program does a lot of broken things, and dereferencing one of the array elements is ultimately what lead to the crash. If it hadn't, then corrupting arbitrary memory (either by writing past the end of the array or by successfully dereferencing those pointers) may well have triggered a crash sometime later. In all of these cases, there'd be little hope of root-causing the bug without some kind of memory dump (assuming code inspection is intractable, as it would be in a more realistic program).


CACM Administrator

The following letter was published in the Letters to the Editor of the April 2012 CACM (http://cacm.acm.org/magazines/2012/4/147353).
--CACM Administrator

Regarding the article "Postmortem Debugging in Dynamic Environments" by David Pacheco (Dec. 2011), I have a question regarding the broken code example in the section on native environments, where Pacheco said, "This simple program has a fatal flaw: while trying to clear each item in the ptrs array at lines 1415, it clears an extra element before the array (where ii = 1)." I agree the out-of-bounds access and write on the ptrs array could be fatal in some cases, but wouldn't writing to an uninitialized pointer be the true cause of fatality in this case?

I am not familiar with Illumos and do not know what hardware the example was run on, but it seems like writing to an address below the ptrs array with the negative index would probably just write to an area of the stack not currently in use, since ptrs, ii, pushed registers, and previous stack frame all likely exist in memory at addresses above ptrs[-1]. However, since the stack is not initialized, *(ptrs[ii]) will access whatever address happens to be in memory at ptrs[ii], while *(ptrs[ii]) = 0; will try writing 0 to that address. Wouldn't such an attempt to write to a random location in memory be more likely to be fatal to the program's execution than writing to an unused location on the stack?

Berin Babcock-McConnell
Tokyo

---------------------------------------------------

AUTHOR'S RESPONSE

The sample program was intentionally broken. If dereferencing one of the array elements did not crash the program immediately (a stroke of luck), then the resulting memory corruption (whether to the stack or elsewhere) might have triggered a crash sometime later. In a more realistic program (where code inspection is impractical), debugging both problems would be hopeless without more information, reinforcing the thesis that rich postmortem analysis tools are the best way to understand such problems in production.

David Pacheco
San Francisco


Displaying all 3 comments

Log in to Read the Full Article

Sign In

Sign in using your ACM Web Account username and password to access premium content if you are an ACM member, Communications subscriber or Digital Library subscriber.

Need Access?

Please select one of the options below for access to premium content and features.

Create a Web Account

If you are already an ACM member, Communications subscriber, or Digital Library subscriber, please set up a web account to access premium content on this site.

Join the ACM

Become a member to take full advantage of ACM's outstanding computing information resources, networking opportunities, and other benefits.
  

Subscribe to Communications of the ACM Magazine

Get full access to 50+ years of CACM content and receive the print version of the magazine monthly.

Purchase the Article

Non-members can purchase this article or a copy of the magazine in which it appears.