Sunday, December 9, 2012

Queueing Theory and Practice

Being a Brit, I love queueing. However, I hate it when my software does it.

This week, I've been wondering why we're seeing a lot of contention in our application. It comes mainly via JAXP code as it makes calls to the class loader (see YourKit screen grab below).

Quite why the Sun XML classes keep calling loadClass on the Tomcat class loader for the same class is currently beyond me.

[Aside: set the jaxp.debug system variable to see greater visibility on what the JAXP code is doing. I saw this defined in javax.xml.xpath.XPathFactoryFinder. See if your Java implementation also uses this to set a debugging field.]

Anyway, when I thought I had identified the problem, it occurred to me that this contention was only such a major problem when I started profiling. Like quantum mechanics, I was affecting the results by taking measurements.

It seemed that the slowdown in performance caused by attaching monitors to the JVM was enough to push it over the edge - thread contention went crazy. A quick Google gave me the equations to describe a M/D/1 queue (that is, a queue where arrival is randoM, servicing the request is Deterministic and there is only 1 servicer. This seems to model our situation well in that any thread can make a call to loadClass but they should all be serviced in roughly the same time).

I used Mathematica to plot these queue equations. Here is the expected average wait time, g(p, u):

g[p_, u_] := p/(2 u (1 - p))
g::usage = "Expected average waiting time"

 Plot[{g[p, 1], g[p, u], g[p, 10]}, {p, 0, 1.0},
  ImageSize -> {600, 320},
  ImagePadding -> {{25, 10}, {25, 10}},
  Frame -> True,
  AxesLabel -> {"arrival/service rate", "expected average waiting time"}],
 {{u, 2, "Service Rate"}, 0.1, 100, Appearance -> "Labeled"},
 FrameLabel ->
  Style["M/D/1 case (random Arrival, Deterministic service, and one service \
channel)", Medium]

where p is

rate work arrives / rate work is serviced

and u is the rate work is serviced.

The graph looks like this (where u is 1, 10 and floating):

The important thing here is that it is not a linear graph. With a service rate of 2 for our variable graph (the purple line), we see that the number in the queue is about 1.0 when the ratio of work arriving to being processed is 80% but if it goes to 90%, the expected queue size goes to about 2.5.

At this usage profile, a 10% change means the average wait time more than doubles.

This could explain why the system became so badly behaved when experiencing only a moderate increase in load.

No comments:

Post a Comment