Question of the Week Channel

The Question of the Week challenges how designers think about embedded design concepts by touching on topics that cover the entire range of issues affecting embedded developers, such as how and why different trade-offs are made to survive in a world of resource- and time-constrained designs.

Is testing always essential?

Wednesday, August 24th, 2011 by Robert Cravotta

This month’s audit of the Army’s armor inserts by the Pentagon’s inspector general finds that testing for the body armor ballistic inserts was not conducted consistently across 5 million inserts across seven contracts. According to the audit, the PM SEQ (Army Program Manager Soldier Equipment) did not conduct all of the required tests on two contracts because they had no protection performance concerns on those inserts. Additionally, the PM SEQ did not always use a consistent methodology for measuring the proper velocity or enforcing the humidity, temperature, weathered, and altitude requirements for the tests.

The audit also reports that the sampling process used did not provide a statistically representative sample for the LOT (Lot Acceptance Test) so that the results of the test cannot be relied on to project identified deficiencies to the entire lot. At this point, no additional testing was performed as part of the audit, so there is no conclusion on whether the ballistic performance of these inserts was adversely affected by the test and quality assurance methods that were applied.

Tests on two lots of recalled inserts so far have found that all of them met “the maximum level of protection specified for threats in combat” according to Matthew Hickman, an Army spokesman. Another spokesman released a statement that “The body armor in use today is performing as it was intended. We are continuing to research our data and as of now have not found a single instance where a soldier has been wounded due to faulty body armor.”

This audit highlights a situation that can impact any product that experiences a significant increase in demand coupled with time sensitivity for availability of that product. High profile examples in the consumer electronics space include game consoles and smart phones. Some of these products underwent recalls or aftermarket fixes. However, similar to the recalled inserts that are passing additional testing, sometimes a product that has not undergone complete testing can still meet all of the performance requirements.

Is all the testing you can do essential to perform every time? Is it ever appropriate to skip a test because “there are no performance concerns?” Do you use a process for modifying or eliminating tests that might otherwise disproportionately affect the product’s pricing or availability without significant offsetting benefit? Is the testing phase of a project an area ripe for optimization or is it an area where we can never do enough?

How does your company handle test failures?

Wednesday, August 17th, 2011 by Robert Cravotta

For many years, most of the projects I worked on were systems that had never been built before in any shape or form. As a consequence, many of the iterations for each of these projects included significant and sometimes spectacular failures as we moved closer to a system that could perform its tasks successfully in an increasingly wider circle of environmental conditions. These path-finding designs needed to be able to operate in a hostile environment (low earth orbit), and they needed to be able to make autonomous decisions on their own as there was no way to guarantee that instructions could come from a central location in a timely fashion.

The complete units themselves were unique prototypes with no more than two iterations in existence at a time. It would take several months to build each unit and develop the procedures by which we would stress and test what the unit could do. The testing process took many more months as the system integration team moved through ground-based testing and eventually moved on to space-based testing. A necessary cost of deploying the units would be to lose it when it reentered the Earth’s atmosphere, but a primary goal for each stage of testing was to collect as much data as possible from the unit until it was no longer able to operate and/or transmit telemetry about its internal state of health.

During each stage of testing, the unit was placed into an environment that would minimize the amount of damage the unit would physically be subjected to (such as operating the unit within a netted room that would prevent the unit from crashing into the floor, walls, or ceiling). The preparation work for each formal test consisted of weeks of refining all of the details in a written test procedure that fortyish people would follow exactly. Any deviations as the final test run would flag a possible abort of the test run.

Despite all of these precautions, sometimes things just did not behave the way the team expected. In each failure case, it was essential that the post mortem team be able to explicitly identify what went wrong and why so that future iterations of the unit would not repeat those failures. Because we were learning how to build a completely autonomous system that had to properly react to a range of uncertain environmental conditions, it could sometimes take a significant effort to identify root causes for failures.

Surprisingly, it also took a lot of effort to prove that the system did not experience any failures that we were not able to identify by simple observation during operation. It took a team of people analyzing the telemetry data days to determine whether the interactions between the various subsystems were behaving correctly or had coincidently behaved in an expected fashion during the test run.

The company knew we were going to experience many failures during this process, but the pressure was always present to produce a system that worked flawlessly. However, when the difference between a flawless operation and one that experienced a subtle, but potentially catastrophic anomaly rests on nuanced interpretation of the telemetry data, it is essential that the development team is not afraid to identify possible anomalies and follow them up with robust analysis.

In this project, a series of failures was the norm, but for how many projects is a sequence of system failures acceptable? Do you feel comfortable raising a flag for potential problems in a design or test run? Does how your company handles failure affect what threshold you apply to searching for anomalies and teasing out true root causes? Or is it safer to search a little less diligently and let said anomalies slip through and be discovered later when you might not be on the project anymore? How does your company handle failures?

How much trial and error do you rely on in designs?

Wednesday, August 10th, 2011 by Robert Cravotta

My wife and I have been watching a number of old television series via DVD and video streaming services. We have both noticed (in a distressing way) a common theme among the shows that purport to have a major character who happens to be a scientist – the scientist(s) know more than any reasonable person would, they accomplish tasks quicker than anyone (or a team of a thousand people) reasonably could, and they make the proper leaps of logic in one or two iterations. While these may be useful mechanisms to keep a 20 to 40 minute story moving along, it in no way reflects our experience in the real engineering world.

Tim Harford’s recent TED talk addresses the successful mechanism of trial and error to create successful complex systems and how it differs from systems that are built around systems built based on a God complex. The talk resonates with my experience and poses a statement I have floated around a few times over the years in a different manner. The few times I have suggested that engineering is a discipline of best guesses has generated some vigorous dissent. Those people offering the most dissent claim that given a complete set of requirements, they can provide an optimum engineering design to meet those requirements. But my statement refers not just to the process of choosing how to solve a requirement specification, but also in making the specifications in the first place. Most systems that must operate in the real world are just too complex for a specification to completely describe the requirements in a single iteration – there is a need for some trial and error to discover what is more or less important for the specification.

In the talk, Tim provides an industrial example regarding the manufacturing of powdered detergent. The process of making the powder involves pumping a fluid, under high pressure, through a nozzle, that distributes the fluid in such a way that as the water evaporates from the sprayed fluid, a powder with specific properties lands in a pile to be boxed up and shipped to stores for end users to purchase. The company in this example originally tried an explicit design approach that reflects a God complex mode of design. The company hired an expert to design the nozzle. Apparently the results were unsatisfactory; however, the company was eventually able to come up with a satisfactory nozzle by using a trial and error method. The designers created ten random nozzles designs and tested them all. They chose the nozzle that performed the best and created ten new variations based on that “winning” nozzle. The company performed this iterative process 45 times and was able to create a nozzle that performed its function well. The nozzle performs well, but the process that produced the nozzle did not require any understanding of why it works.

Over the years, I have heard many stories about how using a similar process yielded a superior solution to a problem than an explicit design approach. Do you use a trial and error approach in your designs? Do you introduce variations in a design, down select the variations based on measured performance, and repeat this process until the level of improvement suggests you are close enough to an optimum configuration? I suspect more people do use a variation and select process of trial and error; however, I am not aware of many tools that facilitate this type of approach. What are your thoughts and experiences on this?

What is driving lower data center energy use?

Wednesday, August 3rd, 2011 by Robert Cravotta

A recently released report from a consulting professor at Stanford University identifies that the growth in electricity use in data centers over the years 2005 to 2010 is significantly lower than the expected doubling based on the growth rate of data centers from 2000 to 2005. Based on the estimates in an earlier report on electricity usage by data centers, worldwide electricity usage has only increased by about 56% over the time period of 2005 to 2010 instead of the expected doubling. In contrast, the growth in data center electricity use in the United States increased by 36%.

Based on estimates of the installed base of data center servers for 2010, the report points out that the growth in installed volume servers slowed substantially over the 2005 and 2010 period by growing about 20% in the United States and 33% worldwide. The installed base of mid-range servers fell faster than the 2007 projections while the installed base of high-end servers grew rapidly instead of declining per the projections. While Google’s data centers were not able to be included in the estimates (because they assemble their own custom servers), the report estimates that Google’s data centers account for less than 1% of electricity used by data centers worldwide.

The author suggests the lower energy use is due to impacts of the 2008 economic crisis and improvements in data center efficiency. While I agree that improving data center efficiency is an important factor, I wonder if the 2008 economic crisis has a first or second order effect on the electricity use of data centers. Did a dip in the growth rate for data services cause the drop in the rate of new server installs or is the market converging on the optimum ratio of servers to services?

My data service costs are lower than they have ever been before – although I suspect we are flirting with a local minimum in data service costs as it has been harder to renew or maintain discounts for these services this year. I suspect my perceived price inflection point is the result of service capacities finally reflecting service usage. The days of huge excess capacity for data services are fading fast and service providers may no longer need to sell those services below market rate to gain users of that excess capacity. The migration from all-you-can-eat data plans to tiered or throttle accounts may also be an indication that excess capacity of data services is finally being consumed.

If the lower than expected energy use of data centers is caused by the economic crisis, will energy spike up once we are completely out of the crisis? Is the lower than expected energy use due more to the market converging on the optimum ration of servers to services – if so, does the economic crisis materially affect energy use during and after the crisis?

One thing this report was not able to do was ascertain how much work was being performed per unit of energy. I suspect the lower than expected energy use is analogous to the change in manufacturing within the United States where productivity continues to soar despite significant drops in the number of people actually performing manufacturing work. While counting the number of installed servers is relatively straightforward, determining how the efficiency of their workload is changing is a much tougher beast to tackle. What do you think is the first order affect that is slowing the growth rate of energy consumption in data centers?

What tools do you use to program multiple processor cores?

Wednesday, July 27th, 2011 by Robert Cravotta

Developers have been designing and building multi-processor systems for decades. New multicore processors are entering the market on a regular basis. However, it seems that the market for new development tools that help designers analyze, specify, code, test, and maintain software targeting multi-processor systems is lagging further and further behind the hardware offerings.

A key function of development tools is to help abstract the complexity that developers must deal with to build the systems they are working on. The humble assembler abstracted the zeros and ones of machine code into more easily remembered mnemonics that enabled developers to build larger and more complex programs. Likewise, compilers have been evolving to provide yet another important level of abstraction for programmers and have all but replaced the use of assemblers for the vast majority of software projects. A key value of operating systems is that it abstracts the configuration, access, and scheduling of the increasing number of hardware resources available in a system from the developer.

If multicore and multi-processor designs are to experience an explosion in use in the embedded and computing markets, it seems that development tools should provide more abstractions to simplify the complexity of building with these significantly more complex processor configurations.

In general, programming languages do not understand the concept of concurrency, and the extensions that do exist usually require the developer to identify the concurrency and explicitly identify where and when such concurrency exists. Developing software as a set of threads is an approach for abstracting concurrency; however, it is not clear how using a threading design method will be able to scale as systems approach ever larger numbers of cores within a single system. How do you design a system with enough threads to occupy more than a thousand cores – or is that the right question?

What tools do you use when programming a multicore or multi-processor system? Does your choice of programming language and compiler reduce your complexity in such designs or does it require you to actively engage more complexity by explicitly identifying areas for parallelism? Do your debugging tools provide you with adequate visibility and control of a multicore/multi-processor system to be able to understand what is going on within the system without requiring you to spend ever more time at the debugging bench with each new design? Does using a hypervisor help you, and if so, what are the most important functions you look for in a hypervisor?

Will flying cars start showing up on the road?

Wednesday, July 20th, 2011 by Robert Cravotta

People have been dreaming of flying cars for decades. The Aerocar made its first flight in 1949; however, it never entered production manufacturing. The Terrafugia Transition recently passed a significant milestone when it was cleared for takeoff by the U.S. National Highway Safety Administration. Does this mean flying cars will soon start appearing on the roads? To clarify, these vehicles are not flying cars in so much as they are roadable light sport aircraft – in essence, they are aircraft that could be considered legal to drive on the streets. The approximately $230,000 price tag is also more indicative of an aircraft rather than an automobile.

The Transition incorporates automotive safety features such as a purpose-built energy absorbing crumple zone, a rigid carbon fiber occupant safety cage, and automotive-style driver and passenger airbags. According to the company, the Transition can take off or land at any public use general aviation airport with at least 2,500′ of runway. On the ground, the Transition can be driven on any road and parked in a standard parking space or household garage. The wings can fold and be stowed vertically on the sides of the vehicle in less than 30 seconds. Pilots will need a Sport Pilot license to fly the vehicle, which requires a minimum of 20 hours of flight time and passing a simple practical test in the aircraft. Drivers will also need a valid driver’s license for use on the ground.

So what makes this vehicle different from the many earlier, and unsuccessful, attempts at bringing a flying car or roadable aircraft to market? In addition to relying on modern engines and composite materials, this vehicle benefits from computer-based avionics. Are modern embedded systems sufficiently advanced and powerful enough to finally push the dream of a roadable aircraft into reality within the next few years? Or will such a dual-mode vehicle make more sense only after automobiles are better able to drive themselves around on the ground? While the $230,000 price tag will limit how many people can gain access to one of these vehicles (if they make it to production), I wonder if aircraft flying into homes will become an issue. Is this just another pipe dream, or are things different this time around that such a vehicle may start appearing on our roads?

Will the Internet become obsolete?

Wednesday, July 13th, 2011 by Robert Cravotta

I saw an interesting question posed in a video the other day: “How much money would someone have to pay you to give up the internet for the rest of your life?” A professor in the video points out the huge gap between the value of using the Internet and the cost to use it. An implied assumption in the question is that the Internet will remain relevant throughout your entire lifetime, but the more I thought about the question, the more I began to wonder if that assumption is reasonable.

While there are many new technologies, devices, and services available today that did not exist a few decades ago, there is no guarantee that any of them will exist a few decades hence. I recently discovered a company that makes custom tables, and their comment on not integrating technology into their table designs illustrates an important point.

“We are determined to give you a table that will withstand the test of time. For example, if you wanted a music player in your table in the 1970s, you wanted an 8-track tape deck, 1980s a cassette tape deck, 1990s a CD player, 2000s an iPod docking station, 2010s a streaming device, and 2020s small spike that you impale into the listener’s tympanic bone, which is now the only way to listen to music, rendering the installation of any of the previous technology a useless scar upon your beautiful table. (No, we don’t actually know if that last one is where music is heading, but if it does, you heard it here first.) The same goes for laptop electrical cords. We can install attachments to deal with power cords, but at the rate battery technology is changing, like your cellular phone or mp3 player, you may just have a docking station you set it on at night, rendering the need for cords obsolete.”

I have seen a number of electronic technologies disappear from my own home and work office over the past few years. When I first setup a home office, I needed a fax machine and dedicated phone line for it. Both are gone today. I watched as my VHS tape collection became worthless, and as a result, my DVD collection is a bit more modest – thank goodness because now I hardly ever watch DVDs anymore because I can stream almost anything I want to watch on a demand basis. While we still have the expensive and beautiful cameras my wife and I bought, we never use them because some of the devices with integrated digital cameras are good enough quality, much easier to use, and much cheaper to use. My children would rather text their friends than actually talk to each other.

So, will the Internet become obsolete in a few decades time as something with more or better functions and is cheaper and easier to use replaces it? I am not sure because the Internet seems to embody a different concept than all of those other technologies that have become obsolete. The Internet is not tied to a specific technology, form factor, access method, or function other than connecting computing devices together.

In a sense, the Internet may be the ultimate embedded system because nearly everyone that uses it does not care about how it is implemented. Abstracting the function of connecting two sites from the underlying technology implemented may allow the Internet to avoid becoming obsolete and replaced. Or does it? Some smartphones differentiate themselves by how they access the Internet – 3G or 4G. Those smartphone will definitely become obsolete in a few years because the underlying technology of the Internet will definitely keep changing.

Will the Internet be replaced by something else? If so, what is you guess as to what will replace the Internet? If not, how will it evolve to encompass the new functions that currently do not exist? As more people and devices attach to the Internet, will it make sense to have separate infrastructures to support data for human and machine consumption?

What does the last Space Shuttle flight mean?

Wednesday, July 6th, 2011 by Robert Cravotta

The final Space Shuttle launch is scheduled for July 8, 2011. This upcoming event is a bittersweet moment for me and, I suspect, for many other people. I spent many years working in aerospace on projects that included supporting the Space Shuttle Main Engines as well as a payload that was cancelled for political (rather than technical) reasons after two years of pre-launch effort.

Similar to the tip of an iceberg, the Space Shuttle is just the front face of the launch and mission infrastructure that was the Space Shuttle program. Like many embedded systems that are contained within end systems, there is a huge amount of ground equipment and technical teams that work behind the scenes to make the Space Shuttle a successful endeavor. So one question is – what is the future of that infrastructure once the Space Shuttle program is completely closed down?

While the United States space program has been a largely publicly funded effort for many decades, the door is now opening for private entities to step up and take the stage. I am hopeful this type of shift will enable a resurgence in the space program because more ideas will be able to compete on how to best deliver space-based services rather than relying on a central group driving the vast majority of the direction that the space program could go. The flurry of aerospace activity and innovation that the Orteig Prize spawned demonstrated that private groups of individuals can accomplish Herculean feats – in this case, flying non-stop across the Pacific Ocean, in either direction, between New York and Paris.

However, I am not sure that a public prize is necessary to spawn a resurgence in aerospace innovation. There are a number of private space ventures already underway, including Virgin Galactic, SpaceX, as well as those companies in the list of private spaceflight companies on Wikipedia.

Does the end of the Space Shuttle program as it has been for the past few decades mean the space program will change? If so, how will it change – especially the hidden (or embedded) infrastructure? Is space just an academic exercise or are there any private/commercial ventures that you think will crack open the potential of space services that become self-sustaining in a private world?

What game(s) do you recommend?

Thursday, June 30th, 2011 by Robert Cravotta

I have been thinking about how games and puzzles can help teach concepts and strengthen a person’s thought patterns for specific types of problem solving. However, there are literally thousands of games available across a multitude of forms, whether they are card, board, or computer-based games. The large number of options can make it challenging to even know when one might be particularly well suited to helping you train your mind for a type of design project. Discussion forums, like this one can collect lessons learned and make you aware of games or puzzles that others have found useful in exercising their minds – as well as being entertaining.

I have a handful of games that I could suggest, but I will start by offering only one recommendation in the hopes that other people will share their finds and thoughts about when and why the recommendation would be worthwhile to someone else.

For anyone that needs to do deep thinking while taking into account a wide range of conditions from a system perspective, I recommend checking out the ancient game of Go. It is a perfect knowledge game played between two players, and it has a ranking or handicap system that makes it possible for two players that are of slightly different strengths to play a challenging game for both players. Rather than explaining the specifics of the game here, I would instead like to focus on what the game forces you to do in order to play competently.

The rules are very simple – each player alternates turns placing a stone on a grid board. The goal of the game is to surround and capture the most territory. The grid is of sufficient size (19×19 points) that your moves have both a short term and a long term impact. Understanding the subtlety and depth of the long term impact of your moves grows in richness with experience and practice – not unlike designing a system in such a way as to avoid shooting yourself in the foot during troubleshooting. If you are too cautious, your opponent will capture too much of the board for your excellent long term planning to matter. If you play too aggressively – such as to capture as much territory as directly or as quickly as possible, you risk trying to defend what you have laid a claim to with a structure that is too weak to withstand any stress from your opponent.

The more I play Go, the easier I am able to see how the relationships between decisions and trade-offs affect how well the game – or a project – will turn out. Being able to find an adequate balance between building a strong structure and progressing forward at an appropriate pace is a constant exercise in being able to read your environment and adjusting to changing conditions.

I would recommend Go to anyone that needs to consider the system level impacts of their design decisions. Do you have a game you would recommend for embedded developers? If so, what is it and why might an embedded developer be interested in trying it out?

How is embedded debugging different?

Wednesday, June 22nd, 2011 by Robert Cravotta

Despite all the different embedded designs I worked on, one of the projects that stands out the most is the first embedded project I worked on – despite the fact that I already had ten years of experience with programming computers before that. I had received money for writing simulators, database engines, an assembler, a time share system, as well as several automation tools for production systems. All of these projects executed on mainframe systems or desktop computers. None of them quite prepared me for how different working on an embedded design is.

My first embedded design was a simple box that would reside on a ground equipment test rack that supported the flight system we were building and demonstrating. There was nothing particularly special about this box – it had a number of input and select lines and it had a few output lines. What surprised me most when putting it through its first checkout tests was how clueless I was as to how to troubleshoot the problems that did arise.

While I was aware of keyboard debounce routines from using my desktop system, I had never had to so completely understand the characteristics of different types of switches before. I had never before had to be aware of the wiring within the system, nor had I ever even considered doing an end-to-end check on every wire in a system ever before. While putting this simple box together, I became aware of so many new ways a design could go wrong that I had never had to consider in my earlier designs.

On top of the new ways that the system could behave incorrectly, the system had no file system, no display system, and no way to print out a trace log or memory dump. This made debugging a very different experience. Printf statements would be of no use, and there was no single-step debugger available. Worse yet, running the target program on my desktop computer, so that it could simulate the code, was mostly useless because I could not bring the real-world inputs and outputs that the box worked with into the desktop system.

As I tackled each debugging issue, I went from a befuddled state of having no idea how to proceed to a state where I adopted new ways of thinking that let me gain the insights I needed to infer how the system was (or was not) working and what needed to change. I worked on that project alone, and it welcomed me into the world of embedded design and working with real world signals with wide open arms.

How did your introduction to embedded systems go? What insights can you share to warn those that are entering the embedded design community about how designing, debugging, and integrating embedded components is different from writing application-level software?