Articles by Robert Cravotta

As a former Technical Editor covering Embedded Processing at EDN, Robert has been following and commenting on the embedded processing space since 2001 (see article index). His expertise includes software development and system design using microprocessors, microcontrollers, digital signal processors (DSPs), multiprocessor architectures, processor fabrics, coprocessors, and accelerators, plus embedded cores in FPGAs, SOCs, and ASICs. Robert's embedded engineering background includes 16 years as a Member of the Technical Staff at Boeing and Rockwell International working on path-finding avionics, power and laser control systems, autonomous vehicles, and vision sensing systems.

Is the collider closure cause for concern?

Wednesday, October 5th, 2011 by Robert Cravotta

Twenty-eight years of discovery is being marked by the closure of the Tevatron proton-antiproton collider last week. The closure of the collider is occurring while scientists around the world are trying to see if they can replicate measurements made by physicists at CERN of neutrinos traveling faster than the speed of light.

The Tevatron has been the most powerful atom smasher in the United States since 1983. Analysis work based on the data collected by the collider will continue for the next few years, but the lab will no longer be pursuing data for collisions of the highest possible energy. The Large Hadron Collider, an accelerator capable of pushing particles to even higher energies, is replacing the Tevatron. Instead, the scientists at the Fermi National Accelerator Laboratory (or Fermilab), the home of the Tevatron, will be pursuing the “intensity frontier” which will focus on working with very intense beams with very large numbers of particles.

To date, the United States government has been a primary source of funding for large and expensive research projects such as the Tevatron collider and the Space Shuttle – both of which have closed down their programs this year. It is unlikely that these are the only research projects operating with aging equipment. Do these two recent program closures portend a slowing down of research, or are they the signs that research efforts are progressing so well that closing these projects are part of refining and reallocating research resources to more challenging discoveries?

How important are reference designs?

Wednesday, September 28th, 2011 by Robert Cravotta

A recent white paper about a study exploring issues facing engineers during the design process identified a number of items that are “essential to the design process” but can be difficult to find. At the top of the list were reference designs and application notes. This is in spite of engineers being able to access a wider range of materials across multiple sources via the internet than ever before.

I think part of the reason why these two types of information are identified as essential and difficult to find stems from the growing complexity of contemporary processors. The number and variety of peripherals and special purpose processing engines that are being integrated into today’s newest processors create a steep learning curve for any developer trying to use these devices in their projects. Providing a compiler and debugger does not sufficiently compensate for the amount of effort to master the complexity of today’s processors without negatively impacting the project schedules.

The term reference design can apply to a wide range of materials. An article about reference designs presents a taxonomy for reference materials based on application complexity and design completeness versus broad to application-specific implementation details. If, as the article identifies, reference designs are a sales and marketing tool, why are such material difficult for developers to find?

One possible reason is that developers do not consider reference materials as essential. Another reason is that reference designs, by their nature, apply to a small swath of processors in a huge sea of options, and this makes classifying and getting these reference designs in front of interested developers challenging at best. Attempts by third-party information sources have had limited success at aggregating and connecting the appropriate reference materials with relevant processors. As evidenced by the conclusions of the referenced study, even processor vendors themselves are experiencing limited success getting word out about their own reference materials.

How important are reference materials in choosing and working with processors and other complex components in your designs? Are all types of reference materials equally important or are some types of information more valuable than others? Is aggregating reference material with the appropriate component the best way to connect developers with reference material? What good ways to classify reference material have you seen that help you better find the material you are looking for without having to wade through a bunch of irrelevant reference material?

The State of Voice User Interfaces

Tuesday, September 20th, 2011 by Robert Cravotta

While touch interfaces have made a splash in the consumer market, voice-based user interfaces have been quietly showing up in more devices. Interestingly, voice user interfaces were expected to become viable long before touch interfaces. The technical challenges to implementing a successful speech recognition capability far exceeded what research scientists expected. That did not however stop story writers and film productions from adopting voice user interfaces in their portrayal of the future. Consider the ship’s computer in the Star Trek series. In addition to using proximity sensors that worked uncannily well in understanding when to open and close doors, the ship’s computer in Star Trek was able to tell when a person was issuing a request or command versus when they were just talking to another person.

Today, the quiet rise of speech recognition in consumer devices is opening up a different way to interact with devices – one that does not require the user to focus their eyes on a display to help them know where to place their fingertips to issues commands. Improving speech recognition technology is providing an alternative for interacting with devices for people with dyslexia. However, there are a number of subtle challenges facing systems that rely on speech recognition, and make it challenging to provide a reliable and robust voice user interface.

For a voice interface to be useful, there are a number of ambiguities the system must be able to clarify. In addition to accurately identifying what words are spoken, the system must be able to reliably filter out words that are not issued by the user. It also it must be able to distinguish between words from the user that are intended for the system versus those words intended for another person or device.

One way that systems enable a user to actively assist the speech recognition module to resolve these types of ambiguity is to force the user to press and/or hold a button indicating that they are issuing a voice command. By relying on an unambiguous input, such as a button press, the speech recognition module is able to leverage the system’s processing capacity at the time a command is most likely being issued. This approach supports a lower power operation because it enables the system to avoid operating in an always-on mode that can drain the system’s energy store.

The positive button press also prompts the user, even unconsciously, to make accommodations based on the environment they are talking. If the environment is noisy, users may move themselves to a quieter location, or position themselves so that the device microphone is shielded from the noise in the area, such as cupping the device with their hand or placing the device close to their mouth. This helps the system act more reliably in a noisy environment, but it relies on the user’s actions to improve the efficiency of the noise immunity. An ideal speech recognition module would have a high immunity to noisy environments while consuming a low amount of low energy without having to rely on the user.

But detecting when the user is speaking and issuing a command to the device is only the first step in implementing a viable voice user interface. Once the system has determined that the user is speaking a command, the system has four more steps to complete to close the loop between the system and the user. Following voice activation, the module needs to perform the actual speech recognition and transcription step. This stage of speech processing also relies on a high level of immunity to noise, but the noise immunity does not need to be as robust as it is for voice the activation stage because this stage of processing is only active when the system has already determined that the user is speaking a command. This stage of processing relies on high accuracy to successfully separate the user’s voice from the environmental noise and transcribing the sound waves into symbols that the rest of the speech module can use.

The third stage of processing takes the output of the transcribed speech and determines the intent and meaning of the speech so as to be able to accurately understand what the user is asking for. This stage of processing may be as simple as comparing the user’s input to a constrained set of acceptable words or phrases. If a match is found, the system acts on it. If no acceptable match is found, the system may prompt the user to reissue the command or ask the user to confirm the module’s guess of the user’s command.

For more sophisticated speech recognition, this stage of processing resolves ambiguity in the semantics of the issued command. This may involve considering each part of the speech in context with the whole message spoken by the user to identify contradictions that could signal an inappropriate way to interpret the user’s spoken words. If the system is able to process free form speech, it may rely on a significant knowledge of language structure to improve its ability to properly identify the meaning of the words the user actually spoke.

The next stage of processing involves acting on the issued command. Is the command a request for information? Is it a request to activate a component in the system? This processing performed during this stage is as varied as there are tasks that a system can perform. The final stage though is to ensure that there is appropriate feedback to the user that their command was received, properly interpreted, and the appropriate actions were started, in progress, or even completed. This might involve an audio tone, a haptic feedback, an audio acknowledgement, or even a change in the display.

There are a number of companies providing the technology to implement speech recognition in your designs. Two of them are Sensory and Nuance. Nuance provides software for speech recognition while Sensory provides both hardware and embedded software for speech recognition. Please share the names and links of any other companies that you know provide tools and resources for speech recognition in the comments.

Does adding an IP address change embedded designs?

Thursday, September 15th, 2011 by Robert Cravotta

A recent analysis from McAfee titled “Caution: Malware Ahead” suggests that the number of IP-connected devices will grow by a factor of fifty over a ten year period based on the number of IP-connected devices last year. The bulk of these devices are expected to be embedded systems. Additionally, connected devices are evolving from a one-way data communication path to a two way dialog – creating potential new opportunities for hacking embedded systems.

Consider that each Chevy Volt from General Motors has its own IP address. The Volt uses an estimated 10 million lines of code executing over approximately 100 control units, and the number of test procedures to develop the vehicle was “streamlined” from more than 600 to about 400. According to Meg Selfe at IBM, they use the IP-connection for a few things today, like finding a charging station, but they hope to use it to push more software out to the vehicles in the future.

As IP-connected appliances become more common in the home and on the industrial floor, will the process for developing and verifying embedded systems change – or is the current process sufficient to address the possible security issues of selling and supporting IP-connected systems? Is placing critical and non-critical systems on separate internal networks sufficient in light of the intent of being able to push software updates to both portions of the system? Is the current set of development tools sufficient to enable developers to test and ensure their system’s robustness from malicious attacks? Will new tools surface or will they derive from tools already used in high safety-critical application designs? Does adding an IP address to an embedded system change how we design and test them?

What should design reviews accomplish?

Wednesday, September 7th, 2011 by Robert Cravotta

I remember my first design review. Well, not exactly the review itself, but I remember the lessons I learned while doing it because it significantly shifted my view of what a design review is supposed to accomplish. I was tasked with reviewing a project and providing comments about the design. It was the nature of my mentor’s response to my comments that started to shape my understanding that there can be disconnects with idealism and practicality.

In this review, I was able to develop a pretty detailed understanding of how the design was structured and how it would work. The idealist in me compelled me to identify not only potential problems in the design but to suggest better ways of implementing portions of the design. My mentor’s response to my suggestions caught me completely by surprise – he did not want to hear the suggestions. According to him, the purpose of the review was to determine whether the design did or did not meet the system requirements. The time for optimizing design decisions was passed – would the design accomplish the requirements or not.

His response baffled and irked me. Wasn’t a design review part of the process of creating the best design possible? Also, I had some really blindingly brilliant observations and suggestions that were now going to go to waste. Looking back, I think the hardline approach my mentor took helped make me a better reviewer and designer.

As it turns out, my suggestions were not discarded without a look; however, the design review is not the best point in the design cycle to explore the subtle nuances of one design approach versus another. Those types of discussions should have occurred and been completed before the design review process even started. On the other hand, for areas where the design does not or might not meet the system requirements, it is imperative that a discussion be initiated to identify where and why there might be some risks in the current design approach. My mentor’s harsh approach clarified the value of focusing observations and suggestions to those parts of the design that will yield the highest return for the effort spent doing the review.

Does this sound like how your design reviews proceed or do they take a different direction? What should be the primary accomplishment of a successful design review and what are those secondary accomplishments that may find their way into the engineering efforts that follow the review process?

Is “automation addiction” a real problem?

Wednesday, August 31st, 2011 by Robert Cravotta

The recent AP article highlights a draft FAA study (I could not find a source link, please add in comments if you find) that finds that pilots sometimes “abdicate too much responsibility to automated systems.” Despite all of the redundancies and fail-safes built into modern aircraft, a cascade of failures can overwhelm pilots who have only been trained to rely on the equipment.

The study examined 46 accidents and major incidents, 734 voluntary reports by pilots and others as well as data from more than 9,000 flights in which a safety official rides in the cockpit to observe pilots in action. It found that in more than 60 percent of accidents, and 30 percent of major incidents, pilots had trouble manually flying the plane or made mistakes with automated flight controls.

A typical mistake was not recognizing that either the autopilot or the auto-throttle — which controls power to the engines — had disconnected. Others failed to take the proper steps to recover from a stall in flight or to monitor and maintain airspeed.

The investigation reveals a fatal airline crash near Buffalo New York in 2009 where the actions of the captain and co-pilot combined to cause an aerodynamic stall, and the plane crashed into the ground. Another crash two weeks later in Amsterdam involved the plane’s altimeters feeding incorrect information to the plane’s computers; the auto-throttle reduced speed such that the plane lost lift and stalled. The flight’s three pilots had not been closely monitoring the craft’s airspeed and experienced “automation surprise” when they discovered the plane was about to stall.

Recently, crash investigators from France are recommending that all pilots get mandatory training in manual flying and handling a high-altitude stall. In May, the FAA proposed that pilots be trained on how to recover from a stall, as well as expose them to more realistic problem scenarios.

But other new regulations are going in the opposite direction. Today, pilots are required to use their autopilot when flying at altitudes above 24,000 feet, which is where airliners spend much of their time cruising. The required minimum vertical safety buffer between planes has been reduced from 2,000 feet to 1,000 feet. That means more planes flying closer together, necessitating the kind of precision flying more reliably produced by automation than human beings.

The same situation is increasingly common closer to the ground.

The FAA is moving from an air traffic control system based on radar technology to more precise GPS navigation. Instead of time-consuming, fuel-burning stair-step descents, planes will be able to glide in more steeply for landings with their engines idling. Aircraft will be able to land and take off closer together and more frequently, even in poor weather, because pilots will know the precise location of other aircraft and obstacles on the ground. Fewer planes will be diverted.

But the new landing procedures require pilots to cede even more control to automation.

These are some of the challenges that the airline industry is facing as it relies on using more automation. The benefits of using more automation are quite significant, but it is enabling new kinds of catastrophic situations caused by human error.

The benefits of automation are not limited to aircraft. Automobiles are adopting more automation with each passing generation. Operating heavy machinery can also benefit from automation. Implementing automation in control systems enables more people with less skill and experience to operate those systems without necessarily knowing how to correct from anomalous operating conditions.

Is “automation addiction” a real problem or is it a symptom of system engineering that has not completely addressed all of the system requirements? As automation moves into more application spaces, the answer to this question becomes more important to define with a sharp edge. Where and how should the line be drawn for recovering from anomalous operating conditions; how much should the control system shoulder the responsibility versus the operator?

Is testing always essential?

Wednesday, August 24th, 2011 by Robert Cravotta

This month’s audit of the Army’s armor inserts by the Pentagon’s inspector general finds that testing for the body armor ballistic inserts was not conducted consistently across 5 million inserts across seven contracts. According to the audit, the PM SEQ (Army Program Manager Soldier Equipment) did not conduct all of the required tests on two contracts because they had no protection performance concerns on those inserts. Additionally, the PM SEQ did not always use a consistent methodology for measuring the proper velocity or enforcing the humidity, temperature, weathered, and altitude requirements for the tests.

The audit also reports that the sampling process used did not provide a statistically representative sample for the LOT (Lot Acceptance Test) so that the results of the test cannot be relied on to project identified deficiencies to the entire lot. At this point, no additional testing was performed as part of the audit, so there is no conclusion on whether the ballistic performance of these inserts was adversely affected by the test and quality assurance methods that were applied.

Tests on two lots of recalled inserts so far have found that all of them met “the maximum level of protection specified for threats in combat” according to Matthew Hickman, an Army spokesman. Another spokesman released a statement that “The body armor in use today is performing as it was intended. We are continuing to research our data and as of now have not found a single instance where a soldier has been wounded due to faulty body armor.”

This audit highlights a situation that can impact any product that experiences a significant increase in demand coupled with time sensitivity for availability of that product. High profile examples in the consumer electronics space include game consoles and smart phones. Some of these products underwent recalls or aftermarket fixes. However, similar to the recalled inserts that are passing additional testing, sometimes a product that has not undergone complete testing can still meet all of the performance requirements.

Is all the testing you can do essential to perform every time? Is it ever appropriate to skip a test because “there are no performance concerns?” Do you use a process for modifying or eliminating tests that might otherwise disproportionately affect the product’s pricing or availability without significant offsetting benefit? Is the testing phase of a project an area ripe for optimization or is it an area where we can never do enough?

How does your company handle test failures?

Wednesday, August 17th, 2011 by Robert Cravotta

For many years, most of the projects I worked on were systems that had never been built before in any shape or form. As a consequence, many of the iterations for each of these projects included significant and sometimes spectacular failures as we moved closer to a system that could perform its tasks successfully in an increasingly wider circle of environmental conditions. These path-finding designs needed to be able to operate in a hostile environment (low earth orbit), and they needed to be able to make autonomous decisions on their own as there was no way to guarantee that instructions could come from a central location in a timely fashion.

The complete units themselves were unique prototypes with no more than two iterations in existence at a time. It would take several months to build each unit and develop the procedures by which we would stress and test what the unit could do. The testing process took many more months as the system integration team moved through ground-based testing and eventually moved on to space-based testing. A necessary cost of deploying the units would be to lose it when it reentered the Earth’s atmosphere, but a primary goal for each stage of testing was to collect as much data as possible from the unit until it was no longer able to operate and/or transmit telemetry about its internal state of health.

During each stage of testing, the unit was placed into an environment that would minimize the amount of damage the unit would physically be subjected to (such as operating the unit within a netted room that would prevent the unit from crashing into the floor, walls, or ceiling). The preparation work for each formal test consisted of weeks of refining all of the details in a written test procedure that fortyish people would follow exactly. Any deviations as the final test run would flag a possible abort of the test run.

Despite all of these precautions, sometimes things just did not behave the way the team expected. In each failure case, it was essential that the post mortem team be able to explicitly identify what went wrong and why so that future iterations of the unit would not repeat those failures. Because we were learning how to build a completely autonomous system that had to properly react to a range of uncertain environmental conditions, it could sometimes take a significant effort to identify root causes for failures.

Surprisingly, it also took a lot of effort to prove that the system did not experience any failures that we were not able to identify by simple observation during operation. It took a team of people analyzing the telemetry data days to determine whether the interactions between the various subsystems were behaving correctly or had coincidently behaved in an expected fashion during the test run.

The company knew we were going to experience many failures during this process, but the pressure was always present to produce a system that worked flawlessly. However, when the difference between a flawless operation and one that experienced a subtle, but potentially catastrophic anomaly rests on nuanced interpretation of the telemetry data, it is essential that the development team is not afraid to identify possible anomalies and follow them up with robust analysis.

In this project, a series of failures was the norm, but for how many projects is a sequence of system failures acceptable? Do you feel comfortable raising a flag for potential problems in a design or test run? Does how your company handles failure affect what threshold you apply to searching for anomalies and teasing out true root causes? Or is it safer to search a little less diligently and let said anomalies slip through and be discovered later when you might not be on the project anymore? How does your company handle failures?

How much trial and error do you rely on in designs?

Wednesday, August 10th, 2011 by Robert Cravotta

My wife and I have been watching a number of old television series via DVD and video streaming services. We have both noticed (in a distressing way) a common theme among the shows that purport to have a major character who happens to be a scientist – the scientist(s) know more than any reasonable person would, they accomplish tasks quicker than anyone (or a team of a thousand people) reasonably could, and they make the proper leaps of logic in one or two iterations. While these may be useful mechanisms to keep a 20 to 40 minute story moving along, it in no way reflects our experience in the real engineering world.

Tim Harford’s recent TED talk addresses the successful mechanism of trial and error to create successful complex systems and how it differs from systems that are built around systems built based on a God complex. The talk resonates with my experience and poses a statement I have floated around a few times over the years in a different manner. The few times I have suggested that engineering is a discipline of best guesses has generated some vigorous dissent. Those people offering the most dissent claim that given a complete set of requirements, they can provide an optimum engineering design to meet those requirements. But my statement refers not just to the process of choosing how to solve a requirement specification, but also in making the specifications in the first place. Most systems that must operate in the real world are just too complex for a specification to completely describe the requirements in a single iteration – there is a need for some trial and error to discover what is more or less important for the specification.

In the talk, Tim provides an industrial example regarding the manufacturing of powdered detergent. The process of making the powder involves pumping a fluid, under high pressure, through a nozzle, that distributes the fluid in such a way that as the water evaporates from the sprayed fluid, a powder with specific properties lands in a pile to be boxed up and shipped to stores for end users to purchase. The company in this example originally tried an explicit design approach that reflects a God complex mode of design. The company hired an expert to design the nozzle. Apparently the results were unsatisfactory; however, the company was eventually able to come up with a satisfactory nozzle by using a trial and error method. The designers created ten random nozzles designs and tested them all. They chose the nozzle that performed the best and created ten new variations based on that “winning” nozzle. The company performed this iterative process 45 times and was able to create a nozzle that performed its function well. The nozzle performs well, but the process that produced the nozzle did not require any understanding of why it works.

Over the years, I have heard many stories about how using a similar process yielded a superior solution to a problem than an explicit design approach. Do you use a trial and error approach in your designs? Do you introduce variations in a design, down select the variations based on measured performance, and repeat this process until the level of improvement suggests you are close enough to an optimum configuration? I suspect more people do use a variation and select process of trial and error; however, I am not aware of many tools that facilitate this type of approach. What are your thoughts and experiences on this?

What is driving lower data center energy use?

Wednesday, August 3rd, 2011 by Robert Cravotta

A recently released report from a consulting professor at Stanford University identifies that the growth in electricity use in data centers over the years 2005 to 2010 is significantly lower than the expected doubling based on the growth rate of data centers from 2000 to 2005. Based on the estimates in an earlier report on electricity usage by data centers, worldwide electricity usage has only increased by about 56% over the time period of 2005 to 2010 instead of the expected doubling. In contrast, the growth in data center electricity use in the United States increased by 36%.

Based on estimates of the installed base of data center servers for 2010, the report points out that the growth in installed volume servers slowed substantially over the 2005 and 2010 period by growing about 20% in the United States and 33% worldwide. The installed base of mid-range servers fell faster than the 2007 projections while the installed base of high-end servers grew rapidly instead of declining per the projections. While Google’s data centers were not able to be included in the estimates (because they assemble their own custom servers), the report estimates that Google’s data centers account for less than 1% of electricity used by data centers worldwide.

The author suggests the lower energy use is due to impacts of the 2008 economic crisis and improvements in data center efficiency. While I agree that improving data center efficiency is an important factor, I wonder if the 2008 economic crisis has a first or second order effect on the electricity use of data centers. Did a dip in the growth rate for data services cause the drop in the rate of new server installs or is the market converging on the optimum ratio of servers to services?

My data service costs are lower than they have ever been before – although I suspect we are flirting with a local minimum in data service costs as it has been harder to renew or maintain discounts for these services this year. I suspect my perceived price inflection point is the result of service capacities finally reflecting service usage. The days of huge excess capacity for data services are fading fast and service providers may no longer need to sell those services below market rate to gain users of that excess capacity. The migration from all-you-can-eat data plans to tiered or throttle accounts may also be an indication that excess capacity of data services is finally being consumed.

If the lower than expected energy use of data centers is caused by the economic crisis, will energy spike up once we are completely out of the crisis? Is the lower than expected energy use due more to the market converging on the optimum ration of servers to services – if so, does the economic crisis materially affect energy use during and after the crisis?

One thing this report was not able to do was ascertain how much work was being performed per unit of energy. I suspect the lower than expected energy use is analogous to the change in manufacturing within the United States where productivity continues to soar despite significant drops in the number of people actually performing manufacturing work. While counting the number of installed servers is relatively straightforward, determining how the efficiency of their workload is changing is a much tougher beast to tackle. What do you think is the first order affect that is slowing the growth rate of energy consumption in data centers?