We had snow falling for a few hours where I live this week. This is remarkable only to the extent that the last time we had any snow fall was over 21 years ago. The falling snow got me thinking about how most things in our neighborhood, such as cars, roads, houses, and our plumbing, are not subjected to the wonders of snow with any regularity. On days like that, I am thankful that the people who designed most of the things we rely on took into account what impact different extremes, such as hot and cold weather, would have on their design. Designing a system to operate or degrade gracefully in rare operating conditions is a robust design concept that seems to be missing in so many “scientific or technical” television shows.
Designing systems so that they fail in a safe way is an important engineering concept– and it is often invisible to the end user. Developing a failsafe system is an exercise in trading between the consequences and probability of a failure and the cost to mitigate those consequences. There is no single best way to design a failsafe system, but two main tools available to designers are to incorporate interlocks or safeties into the system/or and to implement processes that the user needs to be aware of to mitigate the failure state. Take for example the simple inflatable beach ball; the ones I have seen have such a long list of warnings and disclaimers printed on them that is quite humorous – until you realize that every item printed on that ball probably has a legal case associated with it.
I was completely unaware until a few months ago how a rodent could make an automobile inoperable. Worst, our vehicle became unsteerable while the car was being driven. Fortunately no one got hurt (except the rat that caused the failure). In this case, it looks like the rat got caught in one of the belts in the engine compartment that ultimately made the power steering fail. I was surprised to find out this is actually a common failure when I looked it up on the Internet. I am not aware of a way to design better safety into the vehicle, so we have changed our process when using automobiles. In our case, we do periodic checks of the engine compartment to see if there are any signs of an animal living in there, and we sprinkled peppermint oil around the compartment because we heard that rodents hate the smell.
The ways to make a system failsafe are numerous and I suspect there are a lot of great ideas that have been used over the years. As an example, let me share a memorable failsafe mechanism we implemented on a Space Shuttle Payload I worked on for two years. The payload was going to be actually flying around the Space Shuttle – which means it would be firing engines more than once. This was ground breaking as launching satellites involves firing the engines only once. As a result, we had to go to great lengths to ensure that there could be no way that the engines could misfire – or worse, that the payload could receive a malicious command from the ground to direct the payload into a collision course with the Shuttle. All of the fault tolerant systems and failsafe mechanisms made the design quite complicated. In contrast, the mechanism we implemented to prevent acting on a malicious command was to use a table of random numbers that were loaded onto the payload 30 minutes before the launch and would be known to only two people. Using encryption was not a feasible option at that time because we just did not have the computing power to do it.
Another story of making a system more failsafe involved an X-ray machine. I was never able to confirm if this actually occurred or was a local urban legend, but the lesson is still valid. The model of X-ray machine in question was exposing patients to larger doses of radiation than it was supposed to when the technician pressed the backspace key during a small time window. The short term fix was to send out an order to remove the backspace key from all of the keyboards. The take-away for me was that there are fast, quick, and cheap ways to alleviate a problem that allow you to take the appropriate efforts to find a better way to fix the problem.
Have you ever used a clever approach to making your designs more failsafe? Have you ever run across a product you used that implemented an elegant failsafe mechanism? Have you ever seen a product that you thought of a better way that they could have made the system failsafe or degrade gracefully?