Occasionally I come across something that seems really obvious to me, but actually isn’t obvious from an outsider’s perspective. A few weeks ago I spent some time with a client (and their client) in trying to understand the behaviour of a message-driven platform that made heavy use of ActiveMQ. Without going into too many details, they had a number of production incidents where the queues started behaving erratically as far as they could see from time-series charts of queue depths. In reasoning about what was going on, we put together a visual vocabulary to help them analyse the behaviour not only of the broker but the systems around them. Here is that vocabulary.
Basic Curves
Flat line. Here the rate of messages being placed onto a queue matches the rate of their consumption. This should be considered the natural order of things where both producers and consumers exist on a system.
Angle up. Here the production rate is higher than the consumption rate. While this is not a problem in the short term, over a long period of time a message store may fill up. This is typically seen when message rates increase temporarily from the producer, while the time it takes for a message to be consumed means that the consumers cannot keep up. To get around this you need to increase the consumption rate by adding more consumers or speeding them up.
Angle down. The consumption rate is exceeding the production rate. This normally indicates a queue draining down after a burst of producer activity.
Composite Curves
Rate of change increases. In the curves above, the rate of production has increased relative to the rate of consumption. This means that either:
- your consumers are working slower than before; or
- the number of messages being sent by your producers has increased.
Rate of change decreases. In the curves above, the rate of consumption has increased relative to the rate of production. This means that either:
- your consumers are working faster; or
- the producers are not putting as many messages onto the queue.
Curve flattens. This is a really interesting one; it indicates the following possibilities:
- if depth is 0, your queue is empty; otherwise
- the consumption rate has all of a sudden started matching your production rate exactly (not likely); or
- producers have stopped producing and there is no consumption going on, i.e. you are seeing a pause. If you find yourself in a situation where you are convinced that messages should be being sent (and possibly that consumption exists) then there are a number of external system possibilities that include a long garbage collection, or something other process taking over the I/O on the storage being used by the broker; in the case of a virtualised or shared environment this may be from another server that has access to that same storage.
In broker-based messaging systems, it is important to understand that the broker is only one part of the story and that the performance of both the producers and consumers will have an impact of the overall flow of messages. Hopefully this reference will help you to reason about what is going on.