Oh no second

An oh no second is that moment in time when you realise that you have done something potentially catastrophic and your stomach turns to mush with the dread and realisation.

One of the most common examples is performing some operation on a production environment when you thought you were on a “lesser” environment; classics include deleting data. If you are fortunate then you will be able to recover the data.

I have had it happen to me, however not in the same league as this one …

This article is brilliant and well worth a read. It tells the story of how the author had an oh no second when working on the $500M Mars Rover. I won’t spoilt what happens, you need to read it.

I will only say that I particularly like the bit where he explains one of the outcomes of his error:

And I still remember the shock when Project Manager Pete delivered the decision and the follow-on news: ‘These tests will continue. And Chris will continue to lead them as we have paid for his education. He’s the last person on Earth who would make this mistake again.’

There is definitely some truth to that.

We had an incident recently where a bunch of things conspired together to break a non-production environment right before a demo to a potential customer. It would have been very easy to look for people to blame however that rarely has a good conclusion.

The team had a post-mortem meeting and then we had a subsequent meeting to agree how things would be changed to ensure it didn’t happen again. I explained to the team that we were not looking to assign blame - in fact I am happy to take all the blame but we need to work out how to reduce the risk of it happening again. Also if we don’t learn from this and it happens again then I would be sharing some of the blame with the relevant people.

At the point of failure, a mistake or an oh no second, is not the time to spread blame. It is the time to work out how to fix it and then to later work out how to prevent it in future.

Links

My $500M Mars Rover Mistake: A Failure Story - Chris Lewicki

Random Posts

Git - what to do when it goes wrong

This has a great summary of what to do when things go wrong when using Git. Added here so I know where to look in moments of panic.


Read More

The joy and dilemma of reading

I read a lot of books. Last year I read over 50 books and this year I am heading towards 40 books.


Read More

The term 'Blockchain' is meaningless

There seems to be no single clean definition of what “Blockchain” actually means. There are a lot of woolly statements, several of which are clearly either wrong or too narrowly focused. This article goes into some of the different definitions and why they are wrong and then the implications of this from a business but also a legal perspective.


Read More