Alarms lost in the noise

Adding an alert or alarm for a situation you need to monitor should be a good thing but sometimes, depending on how it is done, it can actually make things worse or at a minimum not work as expected.

In the book Upstream by Dan Heath there is a section about alarms (my emphasis) …

Have you ever rolled your eyes when you heard a fire alarm? That’s alarm fatigue, and it’s a critical problem. A group of researchers studied five ICUs (intensive care units), treating 461 patients, for a month in 2013. There were almost 400,000 audible alarms logged in a month, which broke down to 187 audible alarms per bed per day. When everything is cause for alarm, nothing is cause for alarm. As we design early-warning systems, we should keep these questions in mind:

  • Will the warning give us enough time to act effectively? (If not, why bother?)
  • What rate of false positives can we expect? Our comfort with that level of false positives may, in turn, hinge on the relative cost of handling false positives versus the possibility of missing a real problem.

I had two experiences of alarms during the last week.

We were using an office in a large building in Vienna. Suddenly the fire alarm went off. As we were only using the office for the day we checked whether it was a test or we had to leave the building. It was real so we all evacuated - along with hundreds of other people.

By the time we were outside the first fire engine had turned up. Within five minutes there were at least another nine appliances present. Fortunately they were able to deal with the situation and we could return after just over an hour.

This is an example of an alarm that should always be investigated. They are relatively rare and could involve a risk to life or significant damage to property. Yet, our previous experience of fire alarms meant that there was an element of doubt on whether we needed to act on it.

The second experience was at work. We have recently added a new monitoring tool that alerts us if data is missing. This is something that should never happen but if it happens we would like to know about it as quickly as possible.

The tool got deployed to an environment and we noticed that it was generating an alert every few hours saying there was an issue and then a short while later effectively cancelling that by saying everything was okay.

In investigating the cause we identified that it was due to one part of the workflow not reliably generating all the data required by the next step. This meant that sometimes it would not provide everything needed and hence the tool would generate an alert - even though there was no real underlying problem with the system being monitored.

In this instance we were getting frequent alerts of something important but they were wrong. However this meant that we also didn’t know when a “real” alert was generated as it was lost in the noise. This made the tool effectively useless as this alert would be missed. In the short term we have disabled the monitoring tool while the issue is fixed.

Alarms are important and useful but there must be a high signal to noise ratio otherwise there is a risk they will be ignored and not achieve their aim.

Links

Upstream: How to solve problems before they happen

As an Amazon Associate I earn from qualifying purchases.

Related Posts

The loosely held product roadmap

In a recent article by Ryan Singer, Options, Not Roadmaps, he explains why he doesn’t use roadmaps - he uses options instead.


Read More

Music to focus

When I am working I like to have music in the background to help me focus. I find, like a lot of people, that it needs to be music without lyrics most of the time.


Read More

Fractional talent as an alternative pool of resources

At the startup I work for we used to have a full time HR person. These were in the days when the expectation was that we would have explosive growth and the company would “at least double in size every few months”. Unfortunately that is not the way it played out - we are still making good progress but have much more realistic growth plans.


Read More