Saturday, September 7, 2013

Anatomy of a file handle leak

Production issues are nasty. But when the same non-deterministic state happens in the performance test environment, it's a gift from the gods. You can now examine it in a controlled environment.

The back story: occasionally, a prod server running Jetty 6.1.24 on a 1.6.13 JVM would complain that it had run out of file handles. Using lsof, we could see that over a thousand handles were indeed open and that they appeared to belong to TCP ports.

Using netstat, we could see what these ports were doing:

mds@gbl04215[Skye]:~> netstat -nap 2>/dev/null | head -2 
Active Internet connections (servers and established) 
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name 
mds@gbl04215[Skye]:~> netstat -nap 2>/dev/null | grep 10302 | more 
tcp        0      0 127.0.0.1:32007         0.0.0.0:*               LISTEN      10302/java 
tcp        0      0 :::48746                :::*                    LISTEN      10302/java 
tcp        0      0 :::1099                 :::*                    LISTEN      10302/java 
tcp        0      0 128.162.27.126:8112     :::*                    LISTEN      10302/java 
tcp        0      0 :::19601                :::*                    LISTEN      10302/java 
tcp        0      0 :::41624                :::*                    LISTEN      10302/java 
tcp        0      0 :::10008                :::*                    LISTEN      10302/java 
tcp        0      0 :::19001                :::*                    LISTEN      10302/java 
tcp        0      0 :::56639                :::*                    LISTEN      10302/java 
tcp     1224      0 128.162.27.126:19001    128.164.34.13:47367     CLOSE_WAIT  10302/java 
tcp     1223      0 128.162.27.126:19001    128.164.34.13:41988     CLOSE_WAIT  10302/java 
tcp     1224      0 128.162.27.126:19001    128.164.34.11:56322     CLOSE_WAIT  10302/java 
tcp     1224      0 128.162.27.126:19001    128.164.34.11:49410     CLOSE_WAIT  10302/java 
tcp     1224      0 128.162.27.126:19001    128.164.34.11:55043     CLOSE_WAIT  10302/java 
tcp      815      0 128.162.27.126:19001    133.13.143.184:57011    ESTABLISHED 10302/java 
tcp     1224      0 128.162.27.126:19001    128.164.34.13:48896     CLOSE_WAIT  10302/java 
tcp     1769      0 128.162.27.126:19001    128.164.34.11:45830     CLOSE_WAIT  10302/java 
tcp     1768      0 128.162.27.126:19001    128.164.34.11:53254     CLOSE_WAIT  10302/java 
tcp     1770      0 128.162.27.126:19001    128.164.34.11:43783     CLOSE_WAIT  10302/java 
tcp     1224      0 128.162.27.126:19001    128.164.34.11:49927     CLOSE_WAIT  10302/java 
tcp     1224      0 128.162.27.126:19001    128.164.34.11:44040     CLOSE_WAIT  10302/java 
tcp     1769      0 128.162.27.126:19001    128.164.34.11:45321     CLOSE_WAIT  10302/java 
tcp     1770      0 128.162.27.126:19001    128.164.34.13:44300     CLOSE_WAIT  10302/java 
tcp     1768      0 128.162.27.126:19001    128.164.34.13:45066     CLOSE_WAIT  10302/java 
tcp     1769      0 128.162.27.126:19001    128.164.34.13:47114     CLOSE_WAIT  10302/java 
tcp     1224      0 128.162.27.126:19001    128.164.34.13:40970     CLOSE_WAIT  10302/java 
tcp     1224      0 128.162.27.126:19001    128.164.34.13:39434     CLOSE_WAIT  10302/java
.
.


(where our Java PID is 10303).

Hmm, OK. Well, some interesting things to note: first is that netstat tells you more than the state of the port. As the man page says, it tells you that data is backing up on the port (the second column, Recv-Q). This is measured in bytes.

The second point to note is that the local port, that is the port of our server (19001), is the port we're using for HTTP via Jetty. 

The third is the CLOSE_WAIT state. This means that the client has closed the session (by sending a FIN packet) and that our server has acknowledged it (by returning an ACK packet) but that our application has not yet told the OS to close the socket (see diagram).


Putting this information together, I dumped the JVM's memory to a file and examined its contents.



Randomly selecting a SocketChannel that happened to be lingering in memory, I took the port associated with it and checked that this was one of my ports that is in the CLOSE_WAIT state. It was and so were all the others I looked at.

So, was Jetty not closing ports? Googling didn't find anybody complaining about this but I did find a JVM bug report here that looks awfully similar.

More research is needed but this suspiciously looks like the culprit.

No comments:

Post a Comment