Stress tests are not just for your code. They're for the full stack.
We recently updated our build to use Ivy. This is no small piece of work and (as we discovered) can be prone to error.
Since we work in an investment bank, we're encouraged to use internal Ivy repositories. The developer tasked with this work could not find the JBossCache library we were using (1.2.3 - yes, the code really is that old) and so used version 1.2.2. He regarded this downgrade as the lesser of the evils as transitive dependency management seemed more important than a small difference in the revision number of a library. What could possibly go wrong?
We passed the QA process and deployed to production. A few days later, the app became unresponsive and kept puking this error:
org.jboss.cache.lock.TimeoutException: write lock for /tranchebook could not be acquired after 10000 ms
Why had this cut of the code passed QA?
I ran a soak test and within half an hour saw the same problem. So, I changed the version of the JBossCache back to 1.2.3 (which is what we have been using these last few years). After an hour, I still had not seen this exception. Cue a hasty, overnight deployment to production.
Looking at the JBossCache code, I saw the problem. If you put a break point in org.jboss.cache.interceptors.UnlockInterceptor in the invoke(MethodCall m) method at line:
With jboss-cache-1.2.3.jar, you’ll see that lock_table is a Collections$SynchronizedMap.
With jboss-cache-1.2.2.jar, you’ll see that lock_table is an implementation of HashMap – bad since neither this method nor the methods calling it hold a lock.
It’s easy to see that if multiple threads are changing a HashMap, we’re going to get non-deterministic behaviour (from the JavaDocs: "Note that this implementation is not synchronized. If multiple threads access this map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.").
Version Number Methodology
Version numbers are less arbitrary than most developers imagine. Pressman says:
"Configuration object 1.0 undergoes revision and becomes object 1.1. Minor corrections and changes result in versions 1.1.1 and 1.1.2, which is followed by a major update that is object 1.2. The evolution of object 1.0 continues through 1.3 and 1.4, but at the same time, a major modification to the object results in a new evolutionary path, version 2.0."
(Software Engineering: A Practitioner's Approach, 4th Edition, p233.)
Exact convention is moot but a popular one is:
This is what Maven uses  , is similar to what GNU  uses and (apart from the use of an underscore) what Sun uses:
phillip:~ henryp$ java -version
java version "1.6.0_22"
Cleverly, Sun let their marketing people use the minor number in "soft" literature - hence the reason we often call JDK 1.5, Java 5.0 etc. Apparently, they did this for Solaris also.
The beauty of this system is that it helps me with compatibility. For instance, when moving from a wildcard based classpath to an Ivy managed classpath, we noticed that something like Xalan 2.7.1 and Xalan 2.1.0 were both being pulled in as transitive dependencies for other libraries we're using. Since we 're not using OSGi, we can't use both and as it happened Xalan 2.1.0 was appearing first in the classpath. This was causing a problem since another library needed methods in Xalan 2.7.1. What to do?
Since both Xalan's were 2.x, we could be fairly sure that something written for version 2.1.0 will work with version 2.7.1 since they share the same major number. Note: the opposite is not true - something written for 2.7.1 has no guarantee that it will work when running against 2.1.0. So, we removed 2.1.0 by hand confident that 2.7.1 would serve both libraries. So far so good.
This is similar to the JDK convention. Java compiled with JDK 1.0 will play nicely with JDK 1.6. Similarly, I don't expect massive differences between JDK 1.6.0_22 and 1.6.0_5 except bug fixes and performance improvements (see my previous post). Presumably, the JDK will only move to verison 2.x.y_z when deprecated methods like Thread.stop are removed as this constitutes an API change .