Sunday, February 13, 2011

It's the environment, stupid!

During our recent release, an upgraded version of a library now pulled in MINA as a transitive dependency. Although there were no problems in our Linux QA environment, the Solaris QA environment just wouldn't work. There were no error messages, just our application saying that it was constantly trying to connect to another tier.

When the exact same code works in one environment but not in another, the problem simply has to be environmental. But proving it to your colleagues (who think the issue is with MINA) can be hard and finding what the exact problem is even harder.

If it's environmental, then we have to step away from Java (you can never live totally inside the virtual machine...) and look at what the actual machine is doing.

Let's write a test:
package com.henryp.nio;

import java.io.IOException;
import java.net.InetAddress;
import java.net.InetSocketAddress;
import java.net.ServerSocket;
import java.net.Socket;
import java.net.SocketAddress;
import java.nio.ByteBuffer;
import java.nio.channels.SocketChannel;
import java.util.concurrent.CountDownLatch;

import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Test;


public class SocketNioTest {

private static final int SERVER_PORT = 9999;
private static ServerSocket serverSocket;

@BeforeClass
public static void setUp() throws IOException {
serverSocket = new ServerSocket(SERVER_PORT);
Thread thread = new Thread(new Runnable() {

public void run() {
while (true) {
try {
Socket accept = serverSocket.accept();
System.out.println("client connected");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

});
}

@AfterClass
public static void tearDown() throws IOException {
serverSocket.close();
}

@Test
public void testNio() throws IOException, InterruptedException {
SocketChannel socketChannel = SocketChannel.open();

System.out.println("getLocalHost...");
InetAddress inetAddress = InetAddress.getLocalHost() ;
System.out.println("binding...");
SocketAddress bindpoint = new InetSocketAddress(inetAddress, SERVER_PORT);
System.out.println("connecting...");
socketChannel.socket().connect(bindpoint );

sendRandomData(socketChannel);

System.out.println("closing...");
socketChannel.socket().close();
}

private void sendRandomData(SocketChannel socketChannel) throws IOException {
ByteBuffer byteBuffer = ByteBuffer.allocate(256);
socketChannel.write(byteBuffer);
}

}

OK, it's not a real unit test (as it can't fail) but JUnit makes a nice test harness. Let's put some break points in and run tcpdump.

[root@vmwareFedoraII henryp]# tcpdump -i lo port 9999

Note that you need to be root to do this. Also note that you need to monitor the loopback interface. This is a very useful feature of Linux as you can then monitor packets that are passed around within the same process - packets that never actually leave the network card.

Now this line:

socketChannel.socket().connect(bindpoint );

causes tcpdump to print:

21:09:59.414367 IP localhost.localdomain.44985 > localhost.localdomain.distinct: Flags [S], seq 625354929, win 32792, options [mss 16396,sackOK,TS val 22409432 ecr 0,nop,wscale 6], length 0

21:09:59.414409 IP localhost.localdomain.distinct > localhost.localdomain.44985: Flags [S.], seq 630583436, ack 625354930, win 32768, options [mss 16396,sackOK,TS val 22409432 ecr 22409432,nop,wscale 6], length 0

21:09:59.414420 IP localhost.localdomain.44985 > localhost.localdomain.distinct: Flags [.], ack 1, win 513, options [nop,nop,TS val 22409432 ecr 22409432], length 0


This is the famous three-way handshake (SYN, SYN/ACK, ACK).

  1. It's saying that our client (that happens to be using port 44985 this is random) sends a SYN packet (Flags [S]) to the server socket with sequence number 625354929.
  2. The server responds with a a SYN/ACK (Flags [S] ... ack) with sequence number 630583436 acknowledging 625354930 (the clients sequence plus 1)
  3. Then the client acknowledges receipt of this with an ACK (ack).

This is a healthy normal exchange. Note that this is only one line of Java code. Even if you single step into the Java source code, you'll see just all this chatter taking place in one native call.

Now, we send some random data and it is this line:

socketChannel.write(byteBuffer);

that produces this output in tcpdump:


21:12:21.380171 IP localhost.localdomain.44985 > localhost.localdomain.distinct: Flags [P.], seq 1:257, ack 1, win 513, options [nop,nop,TS val 22551398 ecr 22409432], length 256

21:12:21.380220 IP localhost.localdomain.distinct > localhost.localdomain.44985: Flags [.], ack 257, win 529, options [nop,nop,TS val 22551398 ecr 22551398], length 0


This is the client sending a PUSH packet (Flags [P.]) of 256 bytes (remember our ByteBuffer.allocate(256)) and the server acknowledging receipt (ack).

seq 1:257 refers to the sequence number for this packet (relative to the first sequence number 625354929). If we drop the stack frame and run the lines in this method again, we see:

21:16:02.165005 IP localhost.localdomain.44985 > localhost.localdomain.distinct: Flags [P.], seq 257:513, ack 1, win 513, options [nop,nop,TS val 22772183 ecr 22551398], length 256

21:16:02.165014 IP localhost.localdomain.distinct > localhost.localdomain.44985: Flags [.], ack 513, win 546, options [nop,nop,TS val 22772183 ecr 22772183], length 0


and the sequence number for the client has moved to seq 257:513.

And finally:

socketChannel.socket().close();

produces:

21:21:48.188898 IP localhost.localdomain.44985 > localhost.localdomain.distinct: Flags [F.], seq 513, ack 1, win 513, options [nop,nop,TS val 23118207 ecr 22772183], length 0

21:21:48.228956 IP localhost.localdomain.distinct > localhost.localdomain.44985: Flags [.], ack 514, win 546, options [nop,nop,TS val 23118247 ecr 23118207], length 0


where the client says it's FINished and the server ACKnowledges this.

Actually, this is not the entire story. Look what happens when the JVM closes down:

21:26:26.173044 IP localhost.localdomain.distinct > localhost.localdomain.44985: Flags [R.], seq 1, ack 514, win 546, options [nop,nop,TS val 23396191 ecr 23118207], length 0

This is the client sending the RST packet. This says: whatever data is currently being processed in the TCP/IP stack, forget it. The JVM/OS is telling the server this to save it from worrying any more about the sockets that have been opened. Apparently, firewalls use these types of packets to stop people from connecting.

So, FINishing the TCP/IP is not sufficient to close the socket. In fact, if we run the test again until after we have closed the socket but before we close the JVM, we see something like this using netstat:


[henryp@vmwareFedoraII ~]$ netstat -nap | grep 9999
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 :::9999 :::* LISTEN 8099/java
tcp 257 0 ::ffff:127.0.0.1:9999 ::ffff:127.0.0.1:52088 CLOSE_WAIT -
tcp 0 0 ::ffff:127.0.0.1:52088 ::ffff:127.0.0.1:9999 FIN_WAIT2 -


And this state can stay around for some time (typically of the order of 10s of seconds, depending on the OS). Note that it's also disconnected from the Java process at this point as it appears to be totally under the control of the OS.

With RST, there is no waiting. And this is what brings me to what we were seeing in our QA environment. But the RST was being sent when the JVM was still alive and during the three-way handshake (SYN, SYN/ACK, RST... WTF??). This infers that the problem could not possibly be Java code since this is far too low a level.

It's easy to blame the new NIO/MINA as that was the only thing new in our stack but since Linux was handling it fine, we set the System Administrators looking at this Solaris box to see what could be wrong. Finally, by comparing all environment files with that of root who could run our code, we found that for the chroot-ed user under which our code runs could not write to /dev/poll. I don't pretend to be a Solaris expert but looking at this link:

"The /dev/poll pseudo-device [is] maintaining the full set of file descriptors being polled in the kernel."

Which sounds a little like (OK, I'm clutching at straws) the opening page on the chapter on Selectors in Java NIO:

"multiplexing make it possible for a single thread to efficiently manage many I/O channels simultaneously. C/C++ coders have had the POSIX select( ) and/or poll( ) system calls in their toolbox for many years."

Now, I know my MINA code is using selectors, so I am assuming that this asks the operating system to add an entry to /dev/poll. Since the thread finds this unexpectedly read-only (a mistake by the admin setting up the chroot jail) things go horribly wrong and a RST is sent just as we're connecting.

Whether this is true or not, chmoding /dev/poll to being read/write for this user solved the problem.

1 comment:

  1. you interested in work in the west london area?
    Java Developer
    £300-£400/day
    Let me know
    jd_hellfire@hotmail.com

    ReplyDelete