Tuesday, August 23, 2011

Optimising Collection Access

I had an interview question recently that asked me to write code to access as efficiently as possible a collection while multiple threads intensively read and write to it.

Giving the caveat that everything should be subject to stress tests anyway, I went ahead and used a java.util.ConcurrentHashMap. I knew this is a very efficient piece of code and I wouldn't have to worry about synchronization.

When I got home, I wondered about my answer and started to put together JMeter tests to establish some empirical data. JMeter provides a lot of the plumbing for stress tests that makes it superior to just writing your own class.

[Incidentally, if you do write your own executable classes for stress tests, don't forget to pause for 4 seconds before running your code. Any object created in the first 4s of the JVM's life is not eligible for an optimization called Biased Locking (see this email from a Sun engineer). Biased Locking is an optimization for the use case where a lock is mostly sought by just one thread. In this case, the favoured thread finds attaining the lock is cheap.]

First, if you wish to plug your code into JMeter, it's helpful to have a superclass that implements the JMeter interface:
package com.henryp.stress;


import org.apache.jmeter.protocol.java.sampler.JavaSamplerClient;
import org.apache.jmeter.protocol.java.sampler.JavaSamplerContext;
import org.apache.jmeter.samplers.SampleResult;

public abstract class AbstractSamplerClient implements JavaSamplerClient {

public SampleResult runTest(JavaSamplerContext javaSamplerContext) {
SampleResult result = new SampleResult();
result.sampleStart();

doSample();

result.sampleEnd();
result.setSuccessful(true);
result.setResponseCodeOK();
result.setResponseMessageOK();

return result;
}

protected abstract void doSample();

}
Then, subclass it. For my ConcurrentHashMap, it looks something like this:
package com.henryp.stress;


import java.util.Map;

import org.apache.jmeter.config.Arguments;

public abstract class AbstractMapSamplerClient extends AbstractSamplerClient {

protected static Map concurrentHashMap;
protected final MapPopulator mapPopulator = new MapPopulator();

@Override
public Arguments getDefaultParameters() {
return mapPopulator.getDefaultParameters();
}

@Override
protected void doSample() {
for (Map.Entry entry :
concurrentHashMap.entrySet()) {
// NoOp while I think of something to do here
}
}

@Override
public synchronized void setupTest(JavaSamplerContext javaSamplerContext) {
if (concurrentHashMap == null) {
System.out.println("setupTest: " + javaSamplerContext);

concurrentHashMap = mapPopulator.makePopulatedMap(javaSamplerContext);
}
}

@Override
public void teardownTest(JavaSamplerContext javaSamplerContext) {
concurrentHashMap = null;
}
}
The MapPopulator is just a utility that instantiates a Map of a user-defined type and populates it with a user-defined number of elements. Setting the default parameters in that class looks something like:
.

.
.
protected static final int DEFAULT_COLLECTION_SIZE = 100;
protected static final String COLLECTION_SIZE_KEY = "COLLECTION_SIZE_KEY";

public Arguments getDefaultParameters() {
System.out.println("getDefaultParameters");

Arguments arguments = new Arguments();

Argument argument = new Argument();
argument.setName(COLLECTION_SIZE_KEY);
argument.setValue("" + DEFAULT_COLLECTION_SIZE);
arguments.addArgument(argument );

return arguments;
}
.
.
.

To make JMeter pick up my classes, I put this in $JMETER_HOME/bin/jmeter.properties:

search_paths=/Users/henryp/Documents/workspace/TestMaven/target/TestMaven-0.0.1-SNAPSHOT.jar

This way, I only had to execute Maven to compile my code and rebuild the JAR. It would be nice to hot-deploy the code but I don't know if JMeter supports this. So, instead I have to restart JMeter each time I change the code :-(

My code that iterates over the entry set of a ConcurrentHashMap was producing a throughput of about 25 000/s when using 100 read-threads while another 10 write-threads added an element to the collection with a constant throughput of 100/s (I've not shown the code for this).

Not bad. But could it be improved easily? Let's try using a java.util.List (perhaps surprisingly, ArrayList and LinkedList gave me similar figures) and use read/write locks to synchronize access. So, the read-code looks something like this:
 static ReadWriteLock readWriteLock;


@Override
protected void doSample() {
Lock readLock = readWriteLock.readLock();
try {
readLock.lock();
for (String aValue : list) {
// no-op
}
} finally {
readLock.unlock();
}
}

and the write-code looks something like this:
 private final AtomicInteger callId = new AtomicInteger();

.
.
.
@Override
protected void doSample() {
Lock writeLock = readWriteLock.writeLock();
try {
writeLock.lock();
String aValue = this.toString() + callId.getAndIncrement();
SingleListReaderWithRWLocksSamplerClient.list.add(aValue);
} finally {
writeLock.unlock();
}
}

The subsequent JMeter test (100 read thread, 10 write threads with a throughput of 100/s) indicated that this was about twice as fast as using ConcurrentHashMap.

What's more, it seemed to suffer a far smaller degradation in performance when I increased the throughput of write-threads to 1000/s.

Conclusion: they're both pretty nifty and perform at the same order of magnitude. Your mileage may vary but it seems java.util.concurrent.locks.ReadWriteLock is a more efficient approach.

Friday, August 12, 2011

Tomcat/Maven plugin

My consultancy contract has come to an end so now I have time to attend my much neglected blog.

In the last few months, I have been Maven-izing and modularizing a legacy application. (Architectural tip: err on the side of being too modularized. It's much easier to fuse modules together than to split them.)

Anyway, one of the jobs was to improve the developer experience by having a web app running via Maven. The advantages include the project working straight out of the box with no awkward manual configuration. But it was only partially working with the Tomcat Maven Plugin. Some parts of the page were not loading because of NullPointerExceptions in our code. And yet, when I pointed a standalone version of Tomcat at the same exploded WAR, everything worked.

Using Firebug's debugger and single-stepping through Javascript code revealed nothing amiss. But stepping through code of the Tomcat embedded in the Maven plugin, I found this in org.apache.catalina.authenticator.AuthenticatorBase:

if (session != null && changeSessionIdOnAuthentication) {

Manager manager = request.getContext().getManager();

manager.changeSessionId(session);

request.changeSessionId(session.getId());

}

So, session IDs were changing on each call. Indeed, Firebug did show that each AJAX call had a different JSESSIONID.

Further investigation shows that the Maven Tomcat Plugin uses Tomcat 6.0.29 (you can tell this by looking at the plugin's pom.xml). Our non-Maven Tomcat was 6.0.14.

What could be different? The change of a patch version number suggests bug-fixes and not new functionality. And what is this changeSessionIdOnAuthentication field?

It happens that we had a bespoke security plugin which extends Tomcat's AuthenticatorBase and authenticates on every HTTP call. With the default value of changeSessionIdOnAuthentication being true, this was causing the session to be discarded on the 6.0.29 Tomcat instance and so all fields in the session object to be uninitialized, ergo the NullPointerExceptions.

The changeSessionIdOnAuthentication field was added some time between releases 6.0.14 and 6.0.29. According to the Tomcat documentation, it is used "to prevent session fixation attacks".

These attacks are when somebody sends you a link to a website for which you have an account. The link includes the JSESSIONID that the attacker has been given by the website. You enter your credentials and the HTTP session associated with this JSESSIONID now carries some flag to say you have logged in. After this, the person who sent you the link can go to the website and as long as he is using the same JSESSIONID as you, he too can see your account since the associated HTTP sessions says you are authenticated.

The changeSessionIdOnAuthentication field being set to true means that as soon as you log in, you get a new JSESSIONID and the attacker is foiled.

Since our developers don't need to worry about this, we turned the functionality off in the server.xml file that the Maven plugin is told to use with:

<host name="localhost"... >
.
.
.
<valve classname="our.implementation.of.AuthenticatorBase" changeSessionIdOnAuthentication="false"...

Thursday, August 4, 2011

In praise of checked exceptions

I am swimming against the tide but I quite like checked exceptions - when done properly. I find them self-documenting and they force engineers to consider the possibility that their request cannot be serviced.

My partner, who is a lawyer but a wannabe geek, laughs when I tell her that lots of the problems I encounter at work are because somebody did not consider a certain set of circumstances. She tells me that in law, it is a very poor contract that does not cover all eventualities.

When I discuss my unpopular view with other engineers, I often hear that if you can't do anything then you have no choice but to re-throw an Exception. This is true but people often underestimate the number of options still open to them upon catching one.

For example, in the event of a database deadlock, one of the threads will immediately be rolled back. The JDBC driver will throw an SQLException but then that thread can try again.

To be fair, Java's JDBC API has left something to be desired. For most JDBC drivers, an update that violates a constraint and is never going to succeed throws the same exception as one that was deadlocked and might work upon a retry. (A class was added to the SQLException hierarchy to address this - java.sql.SQLRecoverableException - but it only appeared as late as JDK 1.6).

I am content to adopt the house-style. If my client prefers RuntimeExceptions then so be it. But an area where I feel checked exceptions should be mandatory is when one team creates an API for another.

A problem I saw recently was when the core team wrote a Java API and published it via Java RMI. A second team called that API but upon encountering a non-deterministic deadlock was surprised when the JVM barfed with an Error. It transpired that DeadlockLoserDataAccessException was not in their classpath - a Spring class that was thrown when Spring converted the SQLException to something a little more descriptive.

With checked exceptions, APIs are not only somewhat self-documenting but better define what the client needs in its classpath.

This proposal is not a panacea - objects passed over the wire might reference non-standard implementations. For instance, objects re-hydrated with Hibernate might have Hibernate implementations of standard Java collection interfaces (you might want to look at Gilead to clean up the object graph in this case).

[Interestingly, the proposal to use Gilead on this particular project also prompted a discussion on what was reasonable tech debt. Not all debt is bad - a mortgage allowed me to buy my house. Only when it becomes unmanageable is it a problem. When the team who wrote the API that passed Hibernate re-hydrated objects across the wire first started coding, they were the only clients of it. Therefore, they considered the use of Gilead as unnecessary. "Do the simplest thing that works," they said and put Hibernate in their client's classpath. The problem became a little more intractable when a second team used the API since they used a different version of Hibernate.]

One last point: if you publish APIs and both you and your clients use Maven, you can have the pom.xml of the API module to pull in any necessary third-party libraries as transitive dependencies. However, do remember that not all teams use Maven. What's more, not all APIs use just Java.