Production issues are nasty. But when the same non-deterministic state happens in the performance test environment, it's a gift from the gods. You can now examine it in a controlled environment.
The back story: occasionally, a prod server running Jetty 6.1.24 on a 1.6.13 JVM would complain that it had run out of file handles. Using lsof, we could see that over a thousand handles were indeed open and that they appeared to belong to TCP ports.
Using netstat, we could see what these ports were doing:
mds@gbl04215[Skye]:~> netstat -nap 2>/dev/null | head -2
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
mds@gbl04215[Skye]:~> netstat -nap 2>/dev/null | grep 10302 | more
tcp 0 0 127.0.0.1:32007 0.0.0.0:* LISTEN 10302/java
tcp 0 0 :::48746 :::* LISTEN 10302/java
tcp 0 0 :::1099 :::* LISTEN 10302/java
tcp 0 0 128.162.27.126:8112 :::* LISTEN 10302/java
tcp 0 0 :::19601 :::* LISTEN 10302/java
tcp 0 0 :::41624 :::* LISTEN 10302/java
tcp 0 0 :::10008 :::* LISTEN 10302/java
tcp 0 0 :::19001 :::* LISTEN 10302/java
tcp 0 0 :::56639 :::* LISTEN 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.13:47367 CLOSE_WAIT 10302/java
tcp 1223 0 128.162.27.126:19001 128.164.34.13:41988 CLOSE_WAIT 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.11:56322 CLOSE_WAIT 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.11:49410 CLOSE_WAIT 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.11:55043 CLOSE_WAIT 10302/java
tcp 815 0 128.162.27.126:19001 133.13.143.184:57011 ESTABLISHED 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.13:48896 CLOSE_WAIT 10302/java
tcp 1769 0 128.162.27.126:19001 128.164.34.11:45830 CLOSE_WAIT 10302/java
tcp 1768 0 128.162.27.126:19001 128.164.34.11:53254 CLOSE_WAIT 10302/java
tcp 1770 0 128.162.27.126:19001 128.164.34.11:43783 CLOSE_WAIT 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.11:49927 CLOSE_WAIT 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.11:44040 CLOSE_WAIT 10302/java
tcp 1769 0 128.162.27.126:19001 128.164.34.11:45321 CLOSE_WAIT 10302/java
tcp 1770 0 128.162.27.126:19001 128.164.34.13:44300 CLOSE_WAIT 10302/java
tcp 1768 0 128.162.27.126:19001 128.164.34.13:45066 CLOSE_WAIT 10302/java
tcp 1769 0 128.162.27.126:19001 128.164.34.13:47114 CLOSE_WAIT 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.13:40970 CLOSE_WAIT 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.13:39434 CLOSE_WAIT 10302/java
.
.
.
(where our Java PID is 10303).
The back story: occasionally, a prod server running Jetty 6.1.24 on a 1.6.13 JVM would complain that it had run out of file handles. Using lsof, we could see that over a thousand handles were indeed open and that they appeared to belong to TCP ports.
Using netstat, we could see what these ports were doing:
mds@gbl04215[Skye]:~> netstat -nap 2>/dev/null | head -2
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
mds@gbl04215[Skye]:~> netstat -nap 2>/dev/null | grep 10302 | more
tcp 0 0 127.0.0.1:32007 0.0.0.0:* LISTEN 10302/java
tcp 0 0 :::48746 :::* LISTEN 10302/java
tcp 0 0 :::1099 :::* LISTEN 10302/java
tcp 0 0 128.162.27.126:8112 :::* LISTEN 10302/java
tcp 0 0 :::19601 :::* LISTEN 10302/java
tcp 0 0 :::41624 :::* LISTEN 10302/java
tcp 0 0 :::10008 :::* LISTEN 10302/java
tcp 0 0 :::19001 :::* LISTEN 10302/java
tcp 0 0 :::56639 :::* LISTEN 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.13:47367 CLOSE_WAIT 10302/java
tcp 1223 0 128.162.27.126:19001 128.164.34.13:41988 CLOSE_WAIT 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.11:56322 CLOSE_WAIT 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.11:49410 CLOSE_WAIT 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.11:55043 CLOSE_WAIT 10302/java
tcp 815 0 128.162.27.126:19001 133.13.143.184:57011 ESTABLISHED 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.13:48896 CLOSE_WAIT 10302/java
tcp 1769 0 128.162.27.126:19001 128.164.34.11:45830 CLOSE_WAIT 10302/java
tcp 1768 0 128.162.27.126:19001 128.164.34.11:53254 CLOSE_WAIT 10302/java
tcp 1770 0 128.162.27.126:19001 128.164.34.11:43783 CLOSE_WAIT 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.11:49927 CLOSE_WAIT 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.11:44040 CLOSE_WAIT 10302/java
tcp 1769 0 128.162.27.126:19001 128.164.34.11:45321 CLOSE_WAIT 10302/java
tcp 1770 0 128.162.27.126:19001 128.164.34.13:44300 CLOSE_WAIT 10302/java
tcp 1768 0 128.162.27.126:19001 128.164.34.13:45066 CLOSE_WAIT 10302/java
tcp 1769 0 128.162.27.126:19001 128.164.34.13:47114 CLOSE_WAIT 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.13:40970 CLOSE_WAIT 10302/java
tcp 1224 0 128.162.27.126:19001 128.164.34.13:39434 CLOSE_WAIT 10302/java
.
.
.
(where our Java PID is 10303).
Hmm, OK. Well, some interesting things to note: first is that netstat tells you more than the state of the port. As the man page says, it tells you that data is backing up on the port (the second column, Recv-Q). This is measured in bytes.
The second point to note is that the local port, that is the port of our server (19001), is the port we're using for HTTP via Jetty.
The third is the CLOSE_WAIT state. This means that the client has closed the session (by sending a FIN packet) and that our server has acknowledged it (by returning an ACK packet) but that our application has not yet told the OS to close the socket (see diagram).
Putting this information together, I dumped the JVM's memory to a file and examined its contents.
Randomly selecting a SocketChannel that happened to be lingering in memory, I took the port associated with it and checked that this was one of my ports that is in the CLOSE_WAIT state. It was and so were all the others I looked at.
So, was Jetty not closing ports? Googling didn't find anybody complaining about this but I did find a JVM bug report here that looks awfully similar.
More research is needed but this suspiciously looks like the culprit.
No comments:
Post a Comment