Since I have my copy of Pressman sitting next to me, I'll use his definition:
Stress tests are designed to confront programs with abnormal situations. In essence, the tester who performs stress testing asks: "How high can we crank this up before it fails?"
(Software Engineering: A Practitioners Approach - Roger S Pressman)
The trouble is: what defines "fails"? For a web application, it might be when the time it takes to service a request exceeds a predefined limit. But which metric and at which point in time? The average time? And which average - mean or median? Or maybe the 90% line? And what if it just spikes momentarily then settles down? Has the test failed at this point? I don't have the answers.
Sufficient for us was when our system started throwing (ie, the first one) non-deterministic ConcurrentModificationExceptions after the code naively tries 5 times to change a data structure other threads are changing (this poor design is fixed in the next release).
Look out for deadlocks in your code during these tests. The annoying thing about deadlocks is that they tend to happen when your system is most under stress so now is a chance to catch them rather than when a production release has gone live and is getting hammered. The JConsole that comes with JDK6 can do this detection automatically for you.
For want of a better reference, Wikipedia defines Soak Tests as "testing a system with a significant load extended over a significant period of time".
For most practical purposes, this boils down to memory leaks. My advice is to attach JConsole and look and look at the memory of the JVM. Then start the stress tests and go home. The following morning, look at the pretty graph JConsole has produced. It should be a nice saw-tooth shape with the average remaining pretty much constant.
Apache's JMeter should be good enough for most of your needs. But the question of "who watches the watchmen?" [quis custodiet ipsos custodes?] has never been more important. Two gotchas that got me were:
- too much monitoring in JMeter causes it to use a lot of memory and start performing poorly thus giving false data.
- my test code had a memory leak. I had to connect to the system I was testing using in-house code. This code tries to connect to a messaging system to send audits for each request to the system. If it cannot, it quietly stores the messages in memory for sending later. And so, it is the test harness that slowly runs out of memory not the system under test. This is generally not your first thought. I fixed this problem by getting the source code of the client JAR and commenting out this call before redeploying the test harness.
Put together a test with typical usage metrics. I once worked for a company that had outsourced a lot of it's coding. The legal draft contract stated that access to page X would take place in no more than 2 seconds. The vendor must have been rubbing its hands with glee as that should have been easy to achieve with just one person using the system. The draft never specified the number of concurrent users. If the system didn't scale, it was legally no longer the vendor's problem. They had met their contractual obligation.
"Performance tests ... often require both hardware and software instrumentation" says Pressman. On Linux, it should be sufficient to run the top command to get a good idea of CPU usage (vmstat is also useful if you want to monitor IO). The stat that is the most interesting is the load average. If this number is greater than your number of CPUs, work is starting to backlog. Or, to put it another way:
“If there are four CPUs on a machine and the reported one-minute load average is 4.00, the machine has been utilizing its processors perfectly for the last 60 seconds”
Linux Journal Dec 1 2006
You can use your performance tests to drive your system while you profile the JVM. There are (as ever) some caveats:
- There's a testing version of the Heisenberg Uncertainty Principle: the act of observation affects the system. Profiling adds an overhead so do this as well as your non-profiling performance tests.
- Just because your profiler shows that a particular method takes a long time it doesn't mean it did a lot of work. Message driven systems will typically spend a lot of time waiting for something to happen. This shows up in the profiler as a method that is "hogging" CPU time. This is a false interpretation. Another common time "hog" is waiting IO code - typically this is an order of magnitude greater than your code in a server. So don't be surprised if you see a lot of time spent here:
java.net.SocketInputStream.read(byte, int, int)