Robust Design: Fault Tolerance – Nature vs. Malicious

Tuesday, June 1st, 2010 by Robert Cravotta

[Editor's Note: This was originally posted on the Embedded Master]

For many applications, the primary focus of robust design principles is on making the design resilient to rare or unexpected real world phenomenon. Embedded designers often employ filters to help mitigate the uncertainty that different types of signal noise can cause. They might use redundant components to mitigate or compensate for specific types of periodic failures within various subsystems. However, as our end devices become ever more connected to each other, there is another source of failure that drives and benefits from a strong set of robust design principles – the malicious attack.

On most of the systems I worked on as an embedded developer, we did not need to spend too much energy addressing malicious attacks on electronics and software within the system because the systems usually included a physical barrier that our electronics control system could safely hide behind. That all changed when we started looking at supporting remote access and control into our embedded subsystems. The project that drove this concept home for me was on a Space Shuttle payload that would repeatedly fire its engines to maneuver around the Shuttle. No previous payload had ever been permitted to fire its engines for multiple maneuvers around the Shuttle before. The only engine fire they performed was to move away from the Shuttle and move into their target orbital position.

The control system for this payload was multiple fault-tolerant, and we often joked among ourselves that the payload would be so safe that it would not ever be able to fire its own engines to perform its tasks because the fault tolerant mechanisms were so complicated. This was even before we knew we had to support one additional type of fault tolerance – ensuring that none of the maneuvering commands came from a malicious source. We had assumed that because we were working in orbital space and the commands would be coming from a Shuttle transmitter, that we were safe from malicious attacks. The NASA engineers were concerned that a ground-based malicious command could send the payload into the Shuttle. The authentication mechanism was crude and clunky by today’s encryption standards. Unfortunately, after more than two years of working on that payload, the program was defunded and we never actually flew the payload around the Shuttle.

Tying this back to embedded systems on the ground, malicious attacks often take advantage of the lack of fault tolerance and security in a system’s hardware and software design. By deliberately injecting fault conditions onto a system or into a communication stream, an attacker with sufficient access to and knowledge of how the embedded system operates, can create physical breaches that provide access to the control electronics or expose vulnerabilities in the software system through techniques such as forcing a buffer overflow.

Adjusting your design to mitigate the consequences of malicious attacks can significantly change how you approach analyzing, building, and testing your system. With this in mind, this series will include topics that would overlap with a series on security issues, but with a mind towards robust design principles and fault tolerance. If you have experience designing for tolerance to malicious attacks in embedded systems, not only with regards to the electronics and software, but also from a mechanical perspective, and you would like to contribute your knowledge and experience to this series, please contact me at Embedded Insights.


2 Responses to “Robust Design: Fault Tolerance – Nature vs. Malicious”

  1. A.S. @EM says:

    Looking forward to the series.

  2. There are so many free services offered by the security firms (, web platform people, any generalist independents left, etc.) it seems odd to call it without scope (e.g. BOP service maintenance in the Gulf of Mexico.)

    Then again, use cases and rude power management schemes (as for compromised servers) are nice to scope things by.
    If there’s no space affect (0G physiology) to the Captain’s voice, either they’re not waiting for orbit to start payload missions or that’s not her directly.
    If the user replaces battery #4 with a museum piece, ‘normal operation’ is out, but compost and mixed drinks are kind of normal electrolytes.
    Is that Febreeze or fuming nitric acid?
    When should the requester offering barfight functionality be presented?
    Is the compromised honeypot core-dumping and resetting in a controlled fashion?

Leave a Reply