Sunday, August 15, 2010

Emergent (Mis)behavior

I've been puzzling for a while over why our bond allocation program would suddenly hang when under stress.

Fortunately for us, we have a good test harness and I could attach a profiler and watch and wait for it to happen. Eventually, it did:

Emergent behaviour as all threads spontaneously start contending for a resource

All the threads found themselves in the same logjam - but for no discernible reason. It was like a spontaneous "phantom jam" on a motorway (there is an interesting computer simulation of exactly this in a video here).

It was clear where the threads were choking (mentioned here).

But why did an innocent - if not particularly efficient - piece of code cause the system to grind to a halt?

Well, it's all down to something called emergent behaviour - or Emergent (Mis)behavior as Jeffrey C. Mogul calls it in his excellent paper here. One (of many) definitions is:

Emergent behavior is that which cannot be predicted through analysis at any level simpler than that of the system as a whole.

Mogul's paper does not address the question of whether emergent behaviour is a property of the system or a function of our ignorance. But he does give many examples of where it has occurred in software systems and gives a warning to the architects of SOA systems. Service Oriented Architecture emphasises a separation of concerns:

The fundamental benefit to solving problems this way is that a number of the solution logic units can be designed to solve immediate concerns while still remaining agnostic to the greater problem.
(SOA, Principles of Service Design, p70 - Thomas Erl)

Although it's a good design principle that helps minimize maintenance costs, it can lead the unwary into a false sense of security. To avoid this, one can only watch a system's behaviour in a test environment for a long period of time - time that most projects don't have. Most projects fix these problems once they have gone live.

There is very little literature in this area but Mogul attempts to build a taxonomy of emergent misbehaviour with the hope that one day we'll be better equipped to say what a particular system may be prone to it.

Emergent behaviour also occurs in non-computer systems. As examples: it was not obvious from the constitution of the Weimar Republic that it could allow the rise of a tyrant like Hitler; the book Freakonomics describes many examples where unexpected behaviour arises in otherwise well-understood systems; the designers of the Millennium Bridge in London understood footbridges and understood how crowds behave on them but did not predict how the crowds response to the bridge would feed back into how the bridge behaved under load.

Regarding this last example, Dr Mogul points out that although this phenomena was know, no quantitive analysis existed. The same problem still afflicts IT community.

No comments:

Post a Comment