Extreme Processing Channel

Processors can only be so small, so cheap, and so fast. Extreme Processing explores the principles behind the practical limits of the smallest, largest, fastest, and most energy efficient embedded devices available so that when a new device pushes the limits, embedded developers can more easily see how it affects them.

Extreme Processing Thresholds: Low Power On-Chip Resources

Friday, April 16th, 2010 by Robert Cravotta

[Editor's Note: This was originally posted on the Embedded Master

In the previous post in this series I pointed out that the “sweet spot” clock rate for active power consumption for some microcontrollers is lower than the maximum operating clock rate for that part. However, looking only at the rated power consumption of these microcontrollers at a steady always-on operating state ignores the fact that many low-power microcontrollers employ on-chip resources that can significantly impact the overall energy consumption of the part as the system transitions through multiple operating scenarios.

For low power applications, designers usually focus on the overall operating system energy draw rather than the peak draw. This focus on overall power efficiency justifies the additional design, build, and testing complexity of using low power and sleep modes when the system does not need to draw on the full processing capacity of the processor. In fact, many low power constrained systems spend the majority of their operating time in some type of sleep mode and transition to full active mode only when needed. This relationship begins to hint at why using only a single uA/MHz benchmark is insufficient to evaluate a processor’s energy performance.

There is a variety of low power modes available to processor architectures. Shutting down just the CPU and leaving all of the other on-chip resources functional is one type of sleep mode. Deeper sleep modes can turn off individual or all peripherals until the only on-chip resource drawing current is for RAM retention. Always-on resources may include a power supervisor circuit with brown-out and power-on-reset functions; these functions must be enabled 100% of the time because the events they are designed to detect cannot be predicted.

So in addition to active power draw, low power designers need to understand the system’s static or leakage current draw when the system is inactive. Another important metric is wake-up-time – the amount of time it takes the system to transition from a low-power mode to the active operating mode because the system clock needs to stabilize. The longer it takes the system clock to stabilize, the more energy is wasted because the system is performing no useful work during that time.

A DMA controller is an on-chip resource that affects a system’s power consumption by offloading the task of moving data from one location to another, say from a peripheral to memory, from the expensive CPU to the much cheaper to operate DMA controller. The following chart from an Atmel whitepaper demonstrates the value of using a DMA controller to offload the CPU, especially as the data rate increases. However, effectively using the DMA can add a level of complexity for the developer because using the DMA controller is not an automated process.

100416-dma.jpg

Some microcontrollers, such as from Atmel and Energy Micro, allow developers to configure the DMA controller and peripherals, through some type of peripheral controller, so that they can collect or transmit data autonomously without waking the CPU. On some devices, the autonomous data transfer can even include direct data transfers from one peripheral to another. The following chart from Energy Micro’s technology description demonstrates the type of energy reduction autonomous peripherals can create. The caveat is that the developer needs to create the highly autonomous setup as there are no tools that can perform this task automatically at this time.

100416-autonomous.jpg

On-chip accelerators not only speed up the execution of frequent computations or data transformations, they also do it for less energy than performing those functions with a software engine. Other types of on-chip resources that save energy draw can include ROM-based firmware, such as being adopted by NXP and Texas Instruments on some of their parts. There are countless approaches available to chip architects to minimize energy consumption, but they each involve trade-offs in system speed, current consumption, and accuracy that are appropriate differently to each type of application. This makes it difficult to develop a single benchmark for comparing energy consumption between processors that overlap capabilities but target slightly different application spaces.

Extreme Processing Thresholds: Low Power #2

Friday, April 9th, 2010 by Robert Cravotta

[Editor's Note: This was originally posted in the Embedded Master

In the previous post in this series I asked whether reporting uA/MHz is an appropriate way to characterize the energy profile of a processor. In this post, I assume uA/Mhz is appropriate for you and offer some suggestions of additional information you might want processor vendors to include with this benchmark when they use it. I will explore how uA/MHz is insufficient for many comparisons in the follow-on post in this series.

One problem with reporting a processor’s power draw as uA/MHz is that this value is not constant across the entire operating range of the processor. Consider the chart for the Texas Instruments MSP430F5438A operating at 3V, from 256-kbyte Flash, and with an integrated LDO. This processor has an operating range up to 25MHz, and the value of uA/MIPS ranges from 230 to 356 uA/MIPS across the full operating range. Additionally, the energy sweet spot for this device is at 8MHz. Using the part at higher (and lower) clock rates consumes more energy per additional unit of processing performance.

Adrian Valenzuela, TI MSP430 MCU Product Marketing Engineer at Texas Instruments shares that many designers using this part operate it at its energy sweet spot of 8MHz precisely because it is most energy efficient at that clock rate rather than at its highest operating speed.

100409-ti-graph.jpg

The chart for Microchip’s PIC16LF1823 device illustrates another way to visualize the energy sweet spot for a processor. In this example, the energy sweet spot is at the “knee” in the curve, which is at approximately 16 MHz – again short of the device’s maximum operating clock rate of 32 MHz. 

100409-microchip-graph.jpg

At a minimum, if a processor vendor is going to specify a uA/MHz (or MIPS) metric, they should also specify the operating frequency of the device to realize that energy efficiency sweet spot. To provide a sense of the processor’s energy efficiency across the full operating range, the processor vendor could include the uA/MHz metric at the device’s highest operating frequency – the implied assumption is that the energy efficiency varies with clock rate in some proportion between these two operating points.

Using a single-value uA/MHz as an energy metric is further complicated when you consider usage profiles that include waking-up from standby or low power modes. In the next post in this series I will explore the challenges of comparing energy efficiency between different processors when the benchmarking parameters differ, such as what kind of software is executing, what is the compiler and memory efficiency, and what peripherals are active?

Extreme Processing Thresholds: Low Power #1

Friday, April 2nd, 2010 by Robert Cravotta

[Editor's Note: This was originally posted on the Embedded Master

In the previous Extreme Processing post about low cost processing options, I touched on what techniques processor vendors are using to drive down the price of their value line devices. However, the focus of these companies is not just on low price, but on delivering the best parts to match the performance, power, and price demands across the entire processing spectrum. Semir Haddad, Marketing Manager of the 32-bit ARM microcontrollers at STMicroelectronics, shares “Our goal ultimately is to have one [processor part] for each use case in the embedded world, from the lowest-cost to the highest-end.”

In addition to extreme low cost parts, there is increasing demand for processors that support longer battery life. Similar to low cost processor announcements, there is a bit of marketing specmanship when releasing a device that drives down the leading edge of the lowest energy usage by a microcontroller. The ARM Cortex-M3 based EFM32 Gecko microcontrollers from EnergyMicro claims a 180 μA/MHz active mode power consumption. Texas Instruments’ 16-bit ultra-low power line of MSP430 microcontrollers claims a 165 μA/MIPs active mode power consumption. Microchip’s new 8-bit PIC1xF182x microcontrollers claim a less than 50 μA/MHz active current consumption.

There are many ways to explore and compare low power measurements, and there have been a number of exchanges between the companies including white papers and YouTube videos. We can explore some of these claims over the next few posts and discussions, but for this post, I would like to focus on whether the use of μA/MHz benchmark is appropriate or if there is a better way for low power processor vendors to communicate their power consumption to you. In the case of the Texas Instruments part, 1 MHz = 1 MIPS when there is no CPU clock divider.

If the μA/MHz benchmark for active operation is appropriate for you, is there any additional information you need disclosed with the benchmark so that you can make an educated judgment and comparison between similar and competing parts? The goal here is to help suppliers communicate the information you to more quickly make decisions. I have a list of characteristics I think you might need along with the benchmark value, and I will share it with you in the next post after you have a chance to discuss it here.

If the μA/MHz benchmark is not appropriate for you, what would be a better way to communicate a device’s relevant power consumption scenarios? I suspect the μA/MHz benchmark is popular in the same way that MIPS benchmarks are popular – because they are a single, simple number that is easy to measure and compare. The goal here is to highlight how to get the information you most need more quickly, easily, and consistently. I have some charts and tables to share with you in the follow-on post.

Extreme Processing Thresholds: Low Price

Friday, March 26th, 2010 by Robert Cravotta

[Editor's Note: This was originally posted on the Embedded Master

Exploring processing thresholds is a tricky proposition. There is a certain amount of marketing specmanship when you are releasing a product that extends some limit of a processing option – say price, power, performance, or integration. It is helpful to understand how the supplying semiconductor vendor is able to meet the new threshold so you can better understand how those trade-offs will or will not affect any design you might choose to consider that new part in.

To lead off this series, I am looking at processors that cross new low price thresholds because there have been a handful of announcements for such parts in the past few months. Texas Instruments’ 16-bit MSP430 represent the lowest public cost parts which start at $0.25. Moving up the processing scale points our attention to NXP’s 32-bit Cortex-M0 processors which start at $0.65. Rounding out the top end of the batch of new value-priced processors is STMicroelectronics’ 32-bit Cortex-M3 processors which start at $0.85.

In looking at these announcements, be aware that the pricing information is not an apples-to-apples comparison. While all of the parts of the announced processor families can address a range of applications spaces and overlap with each other, each of these specific announcements is significant to a different application space. What is most relevant with each of these processors is that each potentially crosses a cost threshold for a given level of processing capacity such that existing designs using a processor at that same price point, but delivering less capability, can now consider incorporating new features with a larger processor than was available before at that price point. The other relevant opportunity is that there are applications that were not using processors before because they cost too much that can now economically implement a function with a processor.

When looking at these types of announcements, there are a few questions you might want to get answers for. For example, what volume of parts must you purchase to get that price? The Cortex-M0 and -M3 pricing is for 10,000 units. This is a common price point for many processor announcements, but you should never assume that all announced pricing is at that level. For example, the MSP430 announcement pricing is for 100,000 units. The announced 1,000 unit pricing for the MSP430G2001 is $0.34. To get an idea of how much volume purchasing can drop the price, VC Kumar, MSP430 MCU product marketing at Texas Instruments, shares that the pricing for the G2001 part drops to around $0.20 at 1,000,000 units. Fanie Duvenhage, Director Product Marketing/Apps/Architecture for the Security, Microcontroller & Technology Development Division at Microchip points out that since around five years ago, very high-volume, small microcontrollers have been available for a unit price in the $0.10 to $0.15 range. So there is a wide range of processing options at a variety of price points.

So how what do these suppliers have to do to be able to sell their processors for these lower prices? According to Joe Yu, Strategic Business Development at NXP Semiconductors, choosing the right core with the right process technology has the largest impact on lowering the price threshold of a processor. The packaging choice represents the second largest impact on pricing thresholds. After that, reducing Flash, then RAM, and then individual features are choices that a processor supplier can make to further lower the pricing point.

VC Kumar shares that the latest MSP430 part price point uses the same process node as other MSP430 devices. The lower price point is driven by smaller on chip resources and by taking into account what are the boundary conditions that the processor will have to contend with. By constraining the boundary conditions, certain value-priced parts can use less expensive, but lower fidelity IP blocks for different functions. As an example, standard MSP430 parts can include a clock module configuration that supports four calibrated frequencies with ±1% accuracy while the value-line sister parts use a clock module configuration that supports a single calibrated frequency and no guarantee for the ±1% accuracy.

Another area of controversy for processors that push the low-end of the pricing model is how much on-chip resources they provide. To reach these price points, the on-chip resources are quite constrained. For example, the Cortex-M3 part includes 16-kbytes of Flash, while the Cortex-M0 part includes 8-kbytes of Flash. The MSP430 part includes 512-bytes of Flash and 128-bytes of SRAM. These memory sizes are not appropriate for many applications, but there are growing areas of applications, including thermometers, metering, and health monitoring that might be able to take advantage of these resource constrained devices.

One thing to remember when considering those devices at the lowest end of the pricing spectrum is that they might represent a new opportunity for designs that do not currently use a processor. Do not limit your thinking to tasks that processors are already doing or you might miss out on the next growth space. Are you working on any projects that can benefit from these value-priced processors or do you think they are just configurations that give bragging rights to the supplier without being practical for real world use?

Extreme Processing Thresholds

Friday, March 19th, 2010 by Robert Cravotta

[Editor's Note: This was originally posted on the Embedded Master

Just in the past few weeks there have been two value-line processor announcements that push the lower limit for pricing. STMicroelectronics’ 32-bit Cortex-M3 value line processors are available starting at $0.85, and Texas Instruments’ 16-bit MSP430 are available starting at $0.25. These announcements follow the earlier announcement that NXP’s 32-bit Cortex-M0 processors are available for as low as $0.65.

These value pricing milestones map out the current extreme thresholds for pricing for a given level of processing performance. These types of announcements are exciting because every time different size processors reach new pricing milestones, they enable new types of applications and designs to incorporate new or more powerful processors into their implementation for more sophisticated capabilities. An analogous claim can be made when new processor power and energy consumption thresholds are pushed.

There are many such thresholds that make it both feasible and not feasible to include some level of processing performance into a given design. Sometimes the market is slower than desired in pushing a key threshold. Consider for example the Wal-Mart mandate to apply RFID labels to shipments. The mandate began in January of 2005 and progress to fully adopt the mandate has been slow.

In this new series, I plan to explore extreme processing thresholds such as pricing and power efficiency. What are the business, technical, hardware, and software constraints that drive where these thresholds currently are and what kinds of innovations or changes does it take for semiconductor companies to push those thresholds a little bit further?

I am planning to start this series by exploring the low-end or value pricing thresholds followed by low energy device thresholds. However, there are many other extreme thresholds that we can explore, such as the maximum amount of processing work that you can perform within a given time or power budget. This might be addressed through higher clock rates as well as parallel processing options including hardware accelerators for vertically targeted application spaces. Examples of other types of extreme thresholds could include interrupt service response latency; how much integrated memory is available; how much peripheral integration and CPU offloading is available; higher I/O sampling rates as well as accuracy and precision; wider operating temperature tolerances; and how much integrated connectivity options are available.

I need your help to identify which thresholds matter most to you. Which types of extreme processing thresholds do you want to see more movement on and why? Your responses here will help me to direct my research to better benefit your needs.