What To Do When Things Go Terribly Wrong

A good question popped up in our forum about what happens when things go wrong with your software?  That is, really wrong or terribly wrong?  From a testing point of view, the question relates to unforeseen failures, or failures that reasonable and adequate testing failed to identify.

There are a lot of answers for this, but the question should be measured in terms impact.  In other words, does it matter if this application should fail unpredictably and if so how?  For example, nobody will care that much if Solitaire should occasionally crash.  On the other hand, a nuclear reactor control system matters a lot more.

Here are a few tips:

  • Prioritize what your software does. Pick out the most important features that offer the most value to your end user.
  • Identify the impact of unforeseen bugs and failures in those high priority components. What could possibly go wrong if those components failed or acted incorrectly?
  • Decide if it actually matters. Not everything out there deserves herculean planning efforts for failure management.
  • Once you know what matters and what doesn’t, plan for the impact and not the failure. That is, plan for the outcome of the failure and not the failure itself.  In the event of a nuclear meltdown, nobody cares what you do with the unforeseen thrown exception that halted the control systems.  They want plans for stopping a catastrophe.
  • Work with the customer to make sure they understand failure impacts and how to manage them appropriately.

Nobody should expect complete perfection, and the reality is that stuff happens sometimes.  This sort of planning is ultimately a part of disaster recovery, something that gets a lot of discussion in terms of natural disasters.  However, good disaster recovery planning should include a wide range of failure scenarios.  If the mainframe at a financial services company has a software glitch, that could be just as bad as an earthquake.  Good planning should accommodate both.

For anyone who is a software tester, you are in a unique place where you can see software failures better than most people.  If a failure in your product would have a significant impact on your customers, work with them to identify the most crucial components and put together contingency plans that they can use for their own disaster recovery planning.  The end result should be better plans for worst case scenarios and customers who have more confidence in their own failure readiness.

2 Responses to “What To Do When Things Go Terribly Wrong”

  1. IT Project Failures mobile edition said:

    [...] tips to improve implementation testing. –> In a thoughtful blog post on software testing, Stanton Champion suggests four ways to avoid these problems: From a testing point of view, the question relates to [...]

  2. Now THAT’S Inflation | Software Testing Blog said:

    [...] in October, I wrote about what to do when things go very wrong – when catastrophic bugs get through the testing net.  It’s not clear if VISA did or [...]

Leave a Reply