Last year, I was using JDK 1.6.0_05 to run our app. I monitored it with YourKit and saw my application threads being blocked by the Reference Handler thread on the monitor of class java.lang.ref.Reference$Lock.
Upon upgrading to JDK 1.6.0_20, there was none of this particular contention. What's more, the app was running about 25% faster without me doing anything!
So, what was causing this contention?
In one sample, the JVM ran for less than 9 minutes and was only doing something “interesting” for about 2 of those yet the total time spent contending for this lock was about 111 seconds. This time was spread across roughly 50-60 threads and YourKit was saying that there are about 50 different methods that appear to be attempting to attain the lock. Of the handful of methods I looked at, I couldn’t see anything in the code that was obviously suspicious. I mean, it was literally blocking in a simple get method!
Let's see what happens when the JDK blocks. Let's take this code:
package com.henryp.lang;
public class ThreadWaitingMain {
public static void main(String[] args) {
final Object lock = new Object();
System.out.println("\nlock.hashCode = " + Integer.toHexString(lock.hashCode()) + "(" + lock.hashCode() + ")"); //
Thread firstThread = new Thread(createLockContender(lock));
firstThread.setDaemon(false);
firstThread.setName("Thread no. 1");
firstThread.start();
Thread otherThread = new Thread(createLockContender(lock));
otherThread.setDaemon(false);
otherThread.setName("Thread no. 2");
otherThread.start();
System.out.println("Main thread finishing");
}
private static Runnable createLockContender(final Object lock) {
return new Runnable() {
@Override
public void run() {
while (true) {
synchronized (lock) {
try {
System.out.println(Thread.currentThread().getId() + ":" + Thread.currentThread().getName() + ": Have lock. About to sleep.");
Thread.currentThread().sleep(1000L);
lock.notify();
System.out.println(Thread.currentThread().getId() + ":" + Thread.currentThread().getName() + ": Relinquishing lock...");
lock.wait();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
};
}
}
Let's run it.
lock.hashCode = 1172e08(18296328)
Main thread finishing
.
.
.
Which PID is it?
[henryp@vmwareFedoraII Test]$ ps aux | grep java
henryp 12148 0.9 0.9 399372 10028 pts/6 Sl+ 22:13 0:00 /usr/java/jdk1.6.0_18/bin/java -server -Dfile.encoding=UTF-8 -classpath /home/henryp/workspace/Test/bin com.henryp.lang.ThreadWaitingMain
henryp 12168 0.0 0.0 4204 704 pts/3 S+ 22:14 0:00 grep java
OK, it's 12148.
By the way, running:
[henryp@vmwareFedoraII Test]$ ps -eLf | grep java
gives us not just the processes but all of the threads:
henryp 12148 11567 12148 0 12 22:13 pts/6 00:00:00 /usr/java/jdk1.6.0_18/bin/java -server -Dfile.encoding=UTF-8 -classpath /home/henryp/workspace/Test/bin com.henryp.lang.ThreadWaitingMain
henryp 12148 11567 12154 0 12 22:13 pts/6 00:00:00 /usr/java/jdk1.6.0_18/bin/java -server -Dfile.encoding=UTF-8 -classpath /home/henryp/workspace/Test/bin com.henryp.lang.ThreadWaitingMain
henryp 12148 11567 12155 0 12 22:13 pts/6 00:00:00 /usr/java/jdk1.6.0_18/bin/java -server -Dfile.encoding=UTF-8 -classpath /home/henryp/workspace/Test/bin com.henryp.lang.ThreadWaitingMain
.
.
.
But which ones are ours?
[henryp@vmwareFedoraII Test]$ jstack 12148 | grep "Thread no."
"Thread no. 2" prio=10 tid=0xb6a7d400 nid=0x2f84 waiting on condition [0x9f388000]
"Thread no. 1" prio=10 tid=0xb6a7bc00 nid=0x2f83 in Object.wait() [0x9f3d9000]
Very useful is our old friend jstack. Right, let's convert those hex values into decimal:
[henryp@vmwareFedoraII Test]$ printf "%d\n" 0x2f84
12164
[henryp@vmwareFedoraII Test]$ printf "%d\n" 0x2f83
12163
Now, let's see which Linux kernel commands are being called by these threads:
[henryp@vmwareFedoraII openjdk7]$ strace -p 12163 > ~/strace_12163.txt 2>&1 &
[henryp@vmwareFedoraII openjdk7]$ strace -p 12164 > ~/strace_12164.txt 2>&1 &
Let's tail one of these:
[henryp@vmwareFedoraII ~]$ tail -f strace_12164.txt
clock_gettime(CLOCK_MONOTONIC, {16852, 857324137}) = 0
clock_gettime(CLOCK_MONOTONIC, {16852, 857391406}) = 0
gettimeofday({1294611927, 576256}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1294611927, 576322753}) = 0
futex(0xb6a7e9a4, FUTEX_WAIT_PRIVATE, 1, {0, 999933247}) = -1 ETIMEDOUT (Connection timed out)
futex(0xb6a7e228, FUTEX_WAKE_PRIVATE, 1) = 0
clock_gettime(CLOCK_MONOTONIC, {16853, 859389474}) = 0
futex(0x9f80090c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f800908, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0xb6a7c828, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xb6a7e96c, FUTEX_WAIT_PRIVATE, 689, NULL) = 0
futex(0xb6a7e028, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
futex(0xb6a7e028, FUTEX_WAKE_PRIVATE, 1) = 0
And it's this futex - "fast userspace mutex" - that we're interested in. It's how Linux does it's locking.
But that's where the story ends as I am not a Linux guru and need to do more research. So, I'm afraid this is an incomplete post. I wrote it to keep a record of what I am working on in my spare time.
Thread.currentThread().sleep(1000L);
ReplyDeleteshould be written as
Thread.sleep(1000L);