Exploration & Prototyping Channel

Design exploration and prototyping are important tools to embedded designers that allow them to reduce the risk that a design will not perform acceptably by being able to test the feasibility and effectiveness of their ideas without building the entire system. The more approaches that a design team can explore, the more confidence they may have that their chosen approach to build the complete system best balances the cost, schedule, and complexity risks associated with the project.

What should design reviews accomplish?

Wednesday, September 7th, 2011 by Robert Cravotta

I remember my first design review. Well, not exactly the review itself, but I remember the lessons I learned while doing it because it significantly shifted my view of what a design review is supposed to accomplish. I was tasked with reviewing a project and providing comments about the design. It was the nature of my mentor’s response to my comments that started to shape my understanding that there can be disconnects with idealism and practicality.

In this review, I was able to develop a pretty detailed understanding of how the design was structured and how it would work. The idealist in me compelled me to identify not only potential problems in the design but to suggest better ways of implementing portions of the design. My mentor’s response to my suggestions caught me completely by surprise – he did not want to hear the suggestions. According to him, the purpose of the review was to determine whether the design did or did not meet the system requirements. The time for optimizing design decisions was passed – would the design accomplish the requirements or not.

His response baffled and irked me. Wasn’t a design review part of the process of creating the best design possible? Also, I had some really blindingly brilliant observations and suggestions that were now going to go to waste. Looking back, I think the hardline approach my mentor took helped make me a better reviewer and designer.

As it turns out, my suggestions were not discarded without a look; however, the design review is not the best point in the design cycle to explore the subtle nuances of one design approach versus another. Those types of discussions should have occurred and been completed before the design review process even started. On the other hand, for areas where the design does not or might not meet the system requirements, it is imperative that a discussion be initiated to identify where and why there might be some risks in the current design approach. My mentor’s harsh approach clarified the value of focusing observations and suggestions to those parts of the design that will yield the highest return for the effort spent doing the review.

Does this sound like how your design reviews proceed or do they take a different direction? What should be the primary accomplishment of a successful design review and what are those secondary accomplishments that may find their way into the engineering efforts that follow the review process?

Is testing always essential?

Wednesday, August 24th, 2011 by Robert Cravotta

This month’s audit of the Army’s armor inserts by the Pentagon’s inspector general finds that testing for the body armor ballistic inserts was not conducted consistently across 5 million inserts across seven contracts. According to the audit, the PM SEQ (Army Program Manager Soldier Equipment) did not conduct all of the required tests on two contracts because they had no protection performance concerns on those inserts. Additionally, the PM SEQ did not always use a consistent methodology for measuring the proper velocity or enforcing the humidity, temperature, weathered, and altitude requirements for the tests.

The audit also reports that the sampling process used did not provide a statistically representative sample for the LOT (Lot Acceptance Test) so that the results of the test cannot be relied on to project identified deficiencies to the entire lot. At this point, no additional testing was performed as part of the audit, so there is no conclusion on whether the ballistic performance of these inserts was adversely affected by the test and quality assurance methods that were applied.

Tests on two lots of recalled inserts so far have found that all of them met “the maximum level of protection specified for threats in combat” according to Matthew Hickman, an Army spokesman. Another spokesman released a statement that “The body armor in use today is performing as it was intended. We are continuing to research our data and as of now have not found a single instance where a soldier has been wounded due to faulty body armor.”

This audit highlights a situation that can impact any product that experiences a significant increase in demand coupled with time sensitivity for availability of that product. High profile examples in the consumer electronics space include game consoles and smart phones. Some of these products underwent recalls or aftermarket fixes. However, similar to the recalled inserts that are passing additional testing, sometimes a product that has not undergone complete testing can still meet all of the performance requirements.

Is all the testing you can do essential to perform every time? Is it ever appropriate to skip a test because “there are no performance concerns?” Do you use a process for modifying or eliminating tests that might otherwise disproportionately affect the product’s pricing or availability without significant offsetting benefit? Is the testing phase of a project an area ripe for optimization or is it an area where we can never do enough?

How does your company handle test failures?

Wednesday, August 17th, 2011 by Robert Cravotta

For many years, most of the projects I worked on were systems that had never been built before in any shape or form. As a consequence, many of the iterations for each of these projects included significant and sometimes spectacular failures as we moved closer to a system that could perform its tasks successfully in an increasingly wider circle of environmental conditions. These path-finding designs needed to be able to operate in a hostile environment (low earth orbit), and they needed to be able to make autonomous decisions on their own as there was no way to guarantee that instructions could come from a central location in a timely fashion.

The complete units themselves were unique prototypes with no more than two iterations in existence at a time. It would take several months to build each unit and develop the procedures by which we would stress and test what the unit could do. The testing process took many more months as the system integration team moved through ground-based testing and eventually moved on to space-based testing. A necessary cost of deploying the units would be to lose it when it reentered the Earth’s atmosphere, but a primary goal for each stage of testing was to collect as much data as possible from the unit until it was no longer able to operate and/or transmit telemetry about its internal state of health.

During each stage of testing, the unit was placed into an environment that would minimize the amount of damage the unit would physically be subjected to (such as operating the unit within a netted room that would prevent the unit from crashing into the floor, walls, or ceiling). The preparation work for each formal test consisted of weeks of refining all of the details in a written test procedure that fortyish people would follow exactly. Any deviations as the final test run would flag a possible abort of the test run.

Despite all of these precautions, sometimes things just did not behave the way the team expected. In each failure case, it was essential that the post mortem team be able to explicitly identify what went wrong and why so that future iterations of the unit would not repeat those failures. Because we were learning how to build a completely autonomous system that had to properly react to a range of uncertain environmental conditions, it could sometimes take a significant effort to identify root causes for failures.

Surprisingly, it also took a lot of effort to prove that the system did not experience any failures that we were not able to identify by simple observation during operation. It took a team of people analyzing the telemetry data days to determine whether the interactions between the various subsystems were behaving correctly or had coincidently behaved in an expected fashion during the test run.

The company knew we were going to experience many failures during this process, but the pressure was always present to produce a system that worked flawlessly. However, when the difference between a flawless operation and one that experienced a subtle, but potentially catastrophic anomaly rests on nuanced interpretation of the telemetry data, it is essential that the development team is not afraid to identify possible anomalies and follow them up with robust analysis.

In this project, a series of failures was the norm, but for how many projects is a sequence of system failures acceptable? Do you feel comfortable raising a flag for potential problems in a design or test run? Does how your company handles failure affect what threshold you apply to searching for anomalies and teasing out true root causes? Or is it safer to search a little less diligently and let said anomalies slip through and be discovered later when you might not be on the project anymore? How does your company handle failures?

How much trial and error do you rely on in designs?

Wednesday, August 10th, 2011 by Robert Cravotta

My wife and I have been watching a number of old television series via DVD and video streaming services. We have both noticed (in a distressing way) a common theme among the shows that purport to have a major character who happens to be a scientist – the scientist(s) know more than any reasonable person would, they accomplish tasks quicker than anyone (or a team of a thousand people) reasonably could, and they make the proper leaps of logic in one or two iterations. While these may be useful mechanisms to keep a 20 to 40 minute story moving along, it in no way reflects our experience in the real engineering world.

Tim Harford’s recent TED talk addresses the successful mechanism of trial and error to create successful complex systems and how it differs from systems that are built around systems built based on a God complex. The talk resonates with my experience and poses a statement I have floated around a few times over the years in a different manner. The few times I have suggested that engineering is a discipline of best guesses has generated some vigorous dissent. Those people offering the most dissent claim that given a complete set of requirements, they can provide an optimum engineering design to meet those requirements. But my statement refers not just to the process of choosing how to solve a requirement specification, but also in making the specifications in the first place. Most systems that must operate in the real world are just too complex for a specification to completely describe the requirements in a single iteration – there is a need for some trial and error to discover what is more or less important for the specification.

In the talk, Tim provides an industrial example regarding the manufacturing of powdered detergent. The process of making the powder involves pumping a fluid, under high pressure, through a nozzle, that distributes the fluid in such a way that as the water evaporates from the sprayed fluid, a powder with specific properties lands in a pile to be boxed up and shipped to stores for end users to purchase. The company in this example originally tried an explicit design approach that reflects a God complex mode of design. The company hired an expert to design the nozzle. Apparently the results were unsatisfactory; however, the company was eventually able to come up with a satisfactory nozzle by using a trial and error method. The designers created ten random nozzles designs and tested them all. They chose the nozzle that performed the best and created ten new variations based on that “winning” nozzle. The company performed this iterative process 45 times and was able to create a nozzle that performed its function well. The nozzle performs well, but the process that produced the nozzle did not require any understanding of why it works.

Over the years, I have heard many stories about how using a similar process yielded a superior solution to a problem than an explicit design approach. Do you use a trial and error approach in your designs? Do you introduce variations in a design, down select the variations based on measured performance, and repeat this process until the level of improvement suggests you are close enough to an optimum configuration? I suspect more people do use a variation and select process of trial and error; however, I am not aware of many tools that facilitate this type of approach. What are your thoughts and experiences on this?

The importance of failing quickly and often

Friday, December 3rd, 2010 by Robert Cravotta

When do recent graduates of kindergarteners outperform recent graduates of business school? Believe it or not, according to Tom Wujec, kindergarteners consistently perform better than business school graduates in the Marshmallow Challenge. This is not a challenge to see who can eat the most marshmallows; rather, it is an exercise in teamwork and rapid prototyping.

The challenge consists of a team of four members building the tallest structure they can using only a single marshmallow, strands of dry spaghetti, a roll of masking tape, and some string. The major constraint is that the marshmallow must go on the top of the structure. The mass of the marshmallow makes this project more challenging than you might first assume.

In Tom’s video, he explains that kindergarteners do better than the business school graduates because they approach the process of building the structure in a more iterative and prototyping sequence than do the business graduates. The kindergarteners start building and placing the marshmallow at the top of the structure right away and they receive immediate feedback from when the structure stands or falls that enables them to make improvements in the next attempt. In contrast, the business graduates discuss a plan of action, choose a direction, and typically do not place the marshmallow on the top of the structure until near the end of the challenge, and when the structure fails, there is not enough time to perform another iteration of rebuilding the structure.

I bring up the Marshmallow Challenge because it augments Casey Weltzin’s recent article “To Design Innovative Products, You Must Fail Quickly” about the importance of prototyping and the role of failures during the prototyping process. Engineers are intimately familiar with failure – in fact, I remember there was a unit on errors and failure as part of my engineering undergraduate studies. Not surprisingly, the people who consistently do the best in the challenge are engineers and architects.

The unrelenting and almost predictable pace of technological improvements that engineers deliver decade after decade belies the amount of failures that engineers experience and iterate through behind each of those publicly visible successes. In a sense, our repeated success as an industry to deliver ever more functional systems at a low price point engenders a sense that it is easier than it truly is to perform these feats of innovation over and over again.

Another interesting observation in Tom’s presentation is that adding an executive admin to a team of CEOs and company executives significantly improves their performance in the challenge than the team without an admin. One take away I see from this is that it is important to be able to expose and remind your management that design is an iterative process where we apply our assumptions to the real world and the real world smacks us down by pointing out our hidden or unspoken assumptions that do not quite align with reality.

To Design Innovative Products, You Must Fail Quickly.

Friday, November 12th, 2010 by Casey Weltzin

While making incremental changes to existing embedded designs may be straightforward, engineers and scientists working on creating new, innovative designs live in a much different world. They are tasked with building complex electrical or electro-mechanical systems that require unique combinations of I/O and processing elements to build. Rather than starting by budgeting time and resources, these designers often need to begin the design process by asking “is this even possible?”

One example of this kind of innovative application is a system created by KCBioMedix, which teaches premature infants how to feed. With up to one-third of premature infants born in the United States suffering from feeding problems, the device called NTrainer, helps coordinate sucking, swallowing, and breathing movements to accelerate feeding without a tube. It is essentially a computerized pacifier that emits gentle pulses of air into an infant’s mouth.

Of course, this kind of innovation seldom takes place without skeptics. Innovative designs require investment that is often heavily competed for and scrutinized within organizations. Or, in the case of startup ventures, entrepreneurs require investment from venture capitalists that have many other places to put their funding. Ultimately, to make a commitment, management or third party sources of capital require the same things – proof that the concept will work and a sound business plan.

Let’s concentrate on the former. Complex devices and machines typically require tens or even hundreds of iterations during the design process; in short, failures. And these iterations can be time consuming and expensive. While making software modifications is relatively easy, changing I/O or processing hardware can take weeks to months. Meanwhile, business leaders and investors become increasingly impatient.

How can both large organizations and startups mitigate the risk of redesigns? One solution commonly employed is to carefully study design requirements and come up with an architecture that is unlikely to need modification. This is a poor solution for two reasons. First, even the most capable designers may fail to foresee the challenges associated with a new, innovative design – resulting in cut traces or a rat’s nest of soldered wires to modify a piece of hardware. Second, because engineers are likely to reuse the architectural patterns and design tools they are used to, innovative features are more likely to be traded-off to fit the constraints that those patterns impose.

A better solution is to use a COTS (commercial off-the-shelf) prototyping platform with a combination of modular I/O, reconfigurable hardware, such as FPGAs (field programmable gate arrays), and high-level development tools. Using this approach, extra I/O points can be “snapped-in” when needed rather than requiring an entire board or daughterboard redesign. Additionally, FPGAs enable designers to implement high-performance custom logic at several orders of magnitude less upfront cost than ASICs (application-specific integrated circuits). Finally, high-level design tools enable both experienced embedded designers and application experts to take advantage of FPGA, real-time operating system, and other technologies without prior expertise or a large team of experts in each technology. In other words, when equipped with the right tools, a small team can “fail quickly” and accelerate the innovation process.

There are a number of economic concerns that must be addressed when using COTS platforms for prototyping. First, since these platforms typically present a much higher up-front cost compared to the BOM (bill of material) components used in a final design, organizations must carefully consider the productivity savings they provide to determine the time to break-even on the investment. For many complex projects, COTS solutions have the potential to reduce the time to first prototype by weeks or months while also reducing the overall size of the development team required. And, it may be possible to reuse these tools between multiple projects in innovation centers (amortizing the upfront cost over a longer period of time).

Another economic consideration that must be made is how much the transition from prototype to final deployment will cost. For small or medium size deployments, it may be beneficial to use COTS hardware embedded in the final device (provided that it meets size and power constraints) – essentially a trade-off between higher BOM cost and reduced development time. On the other hand, for large deployments the benefits of a low BOM cost may warrant moving to a custom cost-optimized design after prototyping. In this case, organizations can save cost by choosing prototyping tools that provide a minimal-investment path to the likely deployment hardware.

Returning to the example of KCBioMedix, the company was able to reduce prototyping time of their premature infant training system from 4 months to 4 weeks using COTS tools – providing an estimated savings of $250,000. COTS hardware is also being used in the final NTrainer product to maximize reuse of IP from the prototyping stage.

The bottom line is that for both the aspiring entrepreneur and the large organization that wishes to maintain an entrepreneurial spirit, prototyping is an essential part of producing innovative designs in time to beat the competition. Organizations that encourage prototyping are more nimble at weeding out good ideas from bad, and ultimately producing differentiated products that command strong margins in the marketplace.