Saturday, July 6, 2013

Anatomy of a Memory Leak

Asking around other developers, it seems that the average analysis for a memory leak can easily take 2 to 3 weeks. When you're working in a 40 node cluster, things can be even more complicated.

This week, I found a memory leak that is only triggered under certain circumstances. What they are is not terribly interesting for somebody outside our project. But the means with which I found it may be.

Basically, the collection of objects to be reaped by our application was empty. But by dumping the JVMs heap (with jmap -dump:live,format=b,file=XXX PID) and analysing the heap with YourKit, I saw that not only was the collection empty, it had never had anything put into it.

How I did this was by looking at HashMap's modCount field. According to the source code, this is the "number of times this HashMap has been structurally modified" (note: replacing a key/value mapping does not count as a modification).

This field is normally used to ensure that iterators are fail-fast. That is, when a call is made to the iterator() method of this data structure, the newly created iterator makes a note of the modCount. Upon calls made to the iterator object, the modCount is checked and if it's not what was expected, an exception is thrown.

How does this help us in interpreting the entrails of a heap dump? Well, if the modCount is 0, chances are (2^31) that nothing ever went into it. If nothing went into it, that path through your code wasn't executed. And that was enough to tell me what the bug was.

HashMap is not the only data structure to use the idea of a modCount. Even the segments of ConcurrentHashMap use this idea. It's well worth checking when you're trying to work out what went wrong.

No comments:

Post a Comment