Showing posts with label memory leak. Show all posts
Showing posts with label memory leak. Show all posts

Monday, March 17, 2014

Mind Reading a JVM on Linux


I'm currently looking at what appears to be a memory leak in a Java process that uses native Tibco libraries.

To help me see what's going on, I discovered I could read the process's memory in Linux. I could do this on a machine running the 3.6.10 kernel but had trouble on the 2.6.16 kernel.

It appears that the memory of a process is available at:

/proc/PID/mem

So, running this program:

package com.phenry.memory;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;

public class AnalyserMain {

    public static void main(String[] args) {
        try {
            read(args);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    private static void read(String[] args)
        throws FileNotFoundException,
        IOException {
        String              filename            = "/proc/" + args[0] + "/mem";
        RandomAccessFile    randomAccessFile    = new RandomAccessFile(new File(filename), "r");
        byte[]              b                   = new byte[1024 * 64];
        long                offset              = Long.parseLong(args[1], 16);
        randomAccessFile.seek(offset);
        int                 off                 = 0;
        
        randomAccessFile.read(b, off, b.length);
        System.out.println("output = " + new String(b));
    }

}

with the PID of my Java process (2801 in this case) as an argument and an address taken from pmap, so:

[henryp@corsair MyMemAnalyser]$ pmap 2801 | grep anon | head
000000000095a000    132K rw---    [ anon ]
00000006f0000000 120832K rw---    [ anon ]
00000006f7600000  28160K -----    [ anon ]
.
.

I could see this:

[henryp@corsair MyMemAnalyser]$ java -classpath bin com.phenry.memory.AnalyserMain 2801 000000000095a000 | strings 
output = 
/usr/java/jdk1.7.0_51/bin/java
org.eclipse.equinox.launcher.Main
/usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
/usr/java/jdk1.7.0_51/jre/lib/amd64/server
libjvm.so
/lib64/libm.so.6
/lib64
libm.so.6
/usr/java/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
libm.so.6
u"V9
t"V9
DCmdFactory
SharedDecoderLock
-Djava.class.path=.
.
.

I've not found the cause of the memory leak yet but this takes me closer.

UPDATE 1: On some systems, you need to execute this as root:

echo 0 > /proc/sys/kernel/yama/ptrace_scope

as they have "hardened" kernels for security reasons (for instance, Ubuntu). Note: you are making your system more vulnerable if you have to do this.

UPDATE 2: The memory leak appeared to be coming from native Tibco libraries. Updating from 8.1 to 8.4 solved the problem.

Saturday, July 6, 2013

Anatomy of a Memory Leak

Asking around other developers, it seems that the average analysis for a memory leak can easily take 2 to 3 weeks. When you're working in a 40 node cluster, things can be even more complicated.

This week, I found a memory leak that is only triggered under certain circumstances. What they are is not terribly interesting for somebody outside our project. But the means with which I found it may be.

Basically, the collection of objects to be reaped by our application was empty. But by dumping the JVMs heap (with jmap -dump:live,format=b,file=XXX PID) and analysing the heap with YourKit, I saw that not only was the collection empty, it had never had anything put into it.

How I did this was by looking at HashMap's modCount field. According to the source code, this is the "number of times this HashMap has been structurally modified" (note: replacing a key/value mapping does not count as a modification).

This field is normally used to ensure that iterators are fail-fast. That is, when a call is made to the iterator() method of this data structure, the newly created iterator makes a note of the modCount. Upon calls made to the iterator object, the modCount is checked and if it's not what was expected, an exception is thrown.

How does this help us in interpreting the entrails of a heap dump? Well, if the modCount is 0, chances are (2^31) that nothing ever went into it. If nothing went into it, that path through your code wasn't executed. And that was enough to tell me what the bug was.

HashMap is not the only data structure to use the idea of a modCount. Even the segments of ConcurrentHashMap use this idea. It's well worth checking when you're trying to work out what went wrong.