Articles by Max Baron

Max Baron is a 35-year veteran of the electronics industry with expertise in low power, efficient-performance SoC configurations, CPU core architectures, DSP, and multicore implementations. He comes to Embedded Insights from Instat's Microprocessor Report where he was principal analyst and organizer of Instat's worldwide Microprocessor Forum. He has managed the R&D efforts at leading semiconductor firms that include National Semiconductor, Sun Microsystems, Sun Microelectronics, and Fujitsu. As a technical microprocessor analyst and consultant, he also continues to publish electronic business research reports covering microcontrollers and DSP.

One Picture Is Worth a Thousand Words

Wednesday, April 18th, 2012 by Max Baron

The San Jose Mercury News described Facebook’s recent acquisition of Instagram for $1 Billion as a deal that “surprised and stunned the tech world.” The surprise will, however, turn to shock and admiration once the free Instagram app is put though its paces. Shock at the program’s engineering simplicity and admiration for those that gained millions of followers and ultimately sold the app to Facebook for the ten-figure sum.

Designed for a cellular phone, the application works also on a tablet, such as iPad, but keeps its cell phone format. The program helps Instagram logged-in members to process and post their own photos as well as download images posted by other members.

Having loaded an existing image or a new one taken by using the incorporated camera controls, the Instagram app user can employ a number of simple image convolution filters to modify the image tint and luminosity. A preset contrast filter can also provide a feeling of sharpness enhancement, while area-specific convolution can introduce artificial focus. The application does not provide support for knowledgeable users to define their own convolution filters.  The app supports basic image zooming, cropping and orientation. Users can also frame the image to be posted. Browsing through other members’ photos is supported among other features by a display of 20 thumb-prints, and a “like” vote button.

Facebook’s recent acquisition of Instagram for $1Billion reminds one of the time programs on the PC began to offer simple custom photo processing and later — support for sharing. Word processing and spreadsheets were not immediately dethroned but had to cede a seat in popularity to multimedia: still and video imaging plus sound and the processing required to do them justice.  The popularity of Facebook and other community sites will accelerate the penetration of shared multimedia among mobile systems but in ways that are not exactly similar to the PC’s adaptation to multimedia workloads. This time around mobile multimedia will have to support existing quality requirements that were not immediately present for the cell phone or earlier for PC software. The present performance demand posed by high resolution images on tablets and cellphones is further magnified by the presence of high performance high pixel density digital cameras incorporated in tablets and in prosumer (professional-consumer) quality cameras.

Even higher performance is required by the mobile processing of video and sound that have barely been touched. Stand-alone camera manufacturers may want to introduce their own direct WiFi and cellular phone connection to help users share quality photos and video with mobiles whose displays will do them justice. Olympus is already offering a cell phone targeted Bluetooth module that can be attached to the auxiliary port present in some of the company’s new mirror-less cameras.

Accelerated by Facebook and others soon to follow, the trend to enhance and share photos and video on mobile systems will require the high image processing capabilities that can be delivered by SoCs employing multiple processing cores. Without them, the tablets and cell phones will feel much warmer than some are already feeling today—and the loss of energy will result in shorter battery life between charges. Many high performance cameras, having already learned the lesson, are employing multiple core SoCs of their own design or purchased from expert sources such as Ambarella. The resulting high quality images will have to be stored on higher capacity servers.

With the acquisition of Instagram, Facebook has started a trend that brings to mind the saying “one picture is worth a thousand words.” For Instagram, the acquisition means $1 Billion. For the semiconductor industry the trend can mean multiple billions of dollars.

Is hardware customization obsolete?

Wednesday, September 21st, 2011 by Max Baron

It used to be that you could install plug-in boards and peripherals for your computer such as can still be done today at the box level in component stereo and video systems. In today’s computers, that option however seems to be rapidly disappearing. During the next few years, with desktops falling out of grace, these aftermarket components will see reduced sales as the easy to customize desktops are replaced by fully integrated systems that are difficult to change or upgrade internally or externally.

The trend may impact on design houses connected directly or indirectly to desktop systems whether by hardware or software products. Computer customization by owner options have been decreasing all along but due to the slow process one may not have fully realized the implications. During the recent months however, it has become impossible to avoid noticing the events that are reflecting on the technology and business of desktop computers: Hewlett Packard announced its intention to sell its PC business; Fry’s Electronics, a major computer and electronics store in our area has cut a few daily advertisements in the local newspaper plus most stores are reducing the number of desktops displayed to make space for increasing offerings of smart phones, tablets and notebooks. And, more indicative than anything else, we see tablets and smart phones used by people who have never before used desktops or laptops.

The plunging prices of computers have already taken a bite out of aftermarket internal components like add-on boards and memory as desktop manufacturers began to integrate more functions in the motherboard to maintain company revenues. Customization received a further setback with the quickly rising adoption of mobile devices that are nearly impossible to upgrade. You can’t add internal memory, change graphics boards, or add a multimedia board or peripherals. Mobile devices are too small in size. They require all internal components to be tightly packed plus for proprietary reasons some manufacturers will not allow the addition of external flash memory and USB devices. Also, any customization even if it were allowed might increase battery consumption and reduce the time between charges.

Computer software has followed hardware. System software that’s dependent on aftermarket components will share their fate. Applications software is suffering from limitations placed by battery lifecycles on internal memory, processor performance and the reduced number of processor cycles imposed by low energy consumption.

But, we may be looking at a more significant cause for the trend than the adoption of mobile devices: the separation of professional applications from entertainment and communication. MS Excel spreadsheets, complex MS Word documents, database management, MS PowerPoint, simulators, calculators etc., can continue to be delivered on powerful desktops whose volume sales are defined by corporate use — sales that will pale in comparison with the combined volume sales expected for consumer-targeted mobile computing appliances. These appliances are already providing news, information, email, access to internet communities, opinions, video and audio, games, internet-enabled purchases of goods, etc., all delivered on simple and easy to use systems.

The general purpose computer is experiencing defeat: consumers that want just the entertainment and communication no longer need to buy bulky complex desktops or laptops or for fear of complexity, avoid buying them. They can buy an appliance that does exactly what they want.

Most of today’s mobile computing devices can be upgraded only by software that can provide additional functions, faster processing or more secure communications—but as perceived at present these computing appliances will otherwise remain unchanged. Like several consumer digital cameras that one may own and use for different purposes, one may have to buy different mobile devices from several manufacturers and/or keep up with new generations coming from the same manufacturer. But mobile device prices are forbidding such luxury and the opportunity of aftermarket customization needs to be explored.

Assuming that the world will again be separated into closed systems and open systems, the latter to gain more traction, it is interesting to envision how these systems might be customized to fit individual preferences. If old computers could be customized via boards plugged into system buses and external peripherals connected to high speed I/O, in mobile devices we might see the emergence of new and old functions packaged for example in thin 1 in² – 2 in² modules that could be introduced / swapped via a removable panel. Different modules could offer features such as higher security, additional codecs, ROM-ed applications, better graphics, higher quality still and video photography, USB and Ethernet support and wireless battery charging.

Do you see open mobile systems triumphing once more over their closed versions and if so, what would be the most important functions to support and how would they be best packaged?

Forward to the Past: A Different Way to Cope with Dark Silicon

Tuesday, February 8th, 2011 by Max Baron

Leigh’s comment to whether dark silicon is a design problem or fundamental law presents an opportunity to explore an “old” processor architecture, the Ambric architecture, an architecture whose implementation made use of dark silicon but did not escape the limitations imposed on Moore’s Law by power budgets.

Mike Butts introduced the Ambric architecture at the 2006 Fall Microprocessor Forum, an event at which I served as technical content chairperson. Tom Halfhill, my colleague at the time, wrote an article about Ambric’s approach and in February 2007 Ambric won In-Stat’s 2006 Microprocessor Report Analysts’ Choice Award for Innovation.

I’ll try to describe the architecture for those that are not familiar with it.

The Ambric’s architecture’s configuration went beyond the classical MIMD definition. It was described as a globally asynchronous – locally synchronous (GALS) architecture — a description that for chip designers held connotations of clock-less processing. The description however does not detract in any way from the innovation and the award for which I voted.

The streaming Ambric architecture as I saw it at the time could be described as a heterogeneous mix of two types of processing cores plus memories and interconnect.

Ambric’s programming innovation involved software objects assigned to specific combinations of cores and/or memory whose execution could proceed in their own time and at their own clock rate– this probably being the reason for the software-defined term “asynchronous architecture.” But the cores were clocked and some could be clocked at different rates—but probably in sync to avoid metastability.

The two types of processor cores provided by Am2045 — the chip introduced at the event — were described as SRs (Streaming RISC) engaged mainly in managing communications and utilities for the second type of cores, the high performance SRDs (Streaming RISC with DSP Extensions) that were the heavy lifter cores in the architecture.

Perhaps the most important part of Ambric’s innovation was the concept of objects assigned to combinations of one or more cores that could be considered as software/hardware black boxes. The black boxes could be interconnected via registers and control that made them behave as if they were FIFOs.

I believe that this is the most important part of the innovation because it almost removes the overhead of thread synchronization. With the removal of this major obstacle to taking advantage of highly parallelizable workloads such as encountered in DSP applications, Ambric opened the architecture for execution by hundreds and possibly thousands of cores — but at the price of reduced generality and the need of more human involvement in the routing of objects on interconnects for best performance of processor cores and memory.

The Ambric architecture can cover with cores and memories a die that for example provides at a lower technology node, four times more transistors but the architecture can’t quadruple its computing speed (switchings per second) due to power budget limitations be they imposed by temperature limitations or battery capacity. Designers can only decrease the chip’s VDD to match a chip’s power dissipation to its power budget, but in doing so, they must reduce clock frequency and associated performance.

The idea of connecting “black boxes” originated with designers of analog computers and hybrid analog/digital computers at least 60 years ago. It was the approach employed in designing computers just before the introduction of the Van Neumann architecture. Ambric’s innovation that created a software/hardware combination is probably independent of the past.

Compared with Ambric’s approach, the UCSD/MIT idea is based on a number of compiler-created different efficient small cores specialized to execute short code sequences critical to the performance of the computer. The UCSD/MIT architecture can enjoy more generality in executing workloads on condition that some specific small cores must be created for the type of workloads targeted. By raising small core frequency without creating dangerous hot spots, the architecture can deliver performance yet keep within power budget boundaries – but it too, can’t deliver increased compute performance at the same rate as Moore’s law delivers transistors.

Dark Silicon Redux: System Design Problem or Fundamental Law?

Tuesday, February 1st, 2011 by Max Baron

Like a spotlight picking out an object in total darkness, the presentation of a solution to a problem may sometimes highlight one aspect while obscuring others. Such were the dark silicon problem and the solution by UCSD and MIT that was presented at Hot Chips 2010. Such was also the article I published in August describing the two universities’ idea that could increase a processor’s efficiency.

At the time of that writing, it appeared that the idea would be followed in time by many others that together would overcome the dark silicon problem. All would be well: Moore’s Law that provides more transistors would also provide higher compute performance.

The term ‘dark silicon’ was probably coined by ARM. ARM described dark silicon as a problem that must be solved by innovative design, but can it be completely solved?Can design continue to solve the problem ‘forever’? To answer the question, we next try to take a qualitative look at the dependencies among the system, the die, and compute performance.

According to a 2009 article published in EE Times, ARM CTO Mike Muller said: “Without fresh innovations, designers could find themselves by 2020 in an era of ‘dark silicon,’ able to build dense devices they cannot afford to power.” Mr. Muller also noted in the same article that“ . . . a 11nm process technology could deliver devices with 16 times more transistors . . . but those devices will only use a third as much energy as today’s parts, leaving engineers with a power budget so pinched they may be able to activate only nine percent of those transistors.”

The use of “only” in the quote may be misunderstood to indicate lower power consumption and higher efficiency. I believe that it indicated disappointment that compared with today’s parts the power consumption would not drop to at least one sixteenth of its 2009 value — to match the rise in the number of transistors.

The term “power budget” can have more than one interpretation. In tethered systems pursuing peak-performance, it can be the worst-case power that is die-temperature related. In mobile systems, it may have a different interpretation: it may be related to the battery-capacity and the percentage of overall system power allocated to the processor. Both interpretations will limit a chip’s power-performance but the limiting factors will be different.

The architects at UCSD/MIT made the best of the unusable silicon problem by surrounding a general-purpose processor core with very efficient small cores located in the dark silicon area. The cores could execute very short sequences of the application code faster and more efficiently than a general-purpose processor but, to keep within the boundary of a power budget, they were probably activated only when needed by the program.

The universities have shown a capability to use part of the dark silicon transistors. It would be interesting to find whether, as transistor numbers increase, the power budget might be dictated by some simple parameters. Finding some limits would rule out dark silicon as a mere problem whose solution will allow designers to utilize 100% of a die to obtain increased performance. In some implementations, the limits could define the best die size and technology of a SoC.

In a system willing to sacrifice power consumption for performance the power budget should be equal to or smaller than the power that can be delivered to the die without causing damage. It is the power (energy/time) that in steady state can be removed from the die by natural and forced cooling, without raising the die’s temperature to a level that would reduce the die’s reliability or even destroy it.

If we allow ourselves the freedom sometimes employed by physicists in simplifying problems, we can say that for a uniformly cooled die of infinite heat conductivity (hot spots can’t occur), the heat generated by circuits and therefore the power budget, are both distributed evenly across the area of the die and are proportional to it (Pbudget α Adie  . . . the larger the die the higher the power budget).

Simplifying things once more, we define a die-wide average energy Eavg in joules required for one single imaginary circuit (the average circuit) to switch state. The power budget (energy divided by time) can now be expressed as the power consumed by the single circuit: Pbudget ~ f * Eavgwhere f is the frequency of switching the single average circuit. The actual frequency of all logic on the chip would be factual = f / n where n is the average number of switchings occurring at the same time.

In other words, assumingdie-area cooling, with all other semiconductor conditions (a given technology node, fabrication, leakage, environment parameters and the best circuit design innovations) and cooling – all kept constant — the peak computing performance obtainable (allowable number of average switching per second) is directly related to the die area. Else the chip will be destroyed.The fate of 3D multi-layer silicon will be worse since the sandwiched layers will enjoy less cooling than the external layers.

Power budgets assigned to processors in mobile systems are more flexible but can be more complex to determine. Camera system designers, for example, can trade-off finder screen size and brightness or fps (frames per second), or zoom and auto focus during video capture — for more processor power. Smart phones that allow non-real-time applications to run slower can save processor power. And, most mobile systems will profit from heterogeneous configurations employing CPUs and hard-wired low power accelerators.

Power budgets in mobile systems will also be affected by software and marketing considerations. Compilers affect the energy consumed by an application based on the number and kind of instructions required for the job to complete. Operating systems are important in managing a system’s resources and controlling the system power states. And, in addition to software and workload considerations the ‘bare core’ power consumption associated with a SoC must compete with claims made by competitors.

If local die temperature and power dissipation terminated the period where higher clock frequency meant more performance, the limitations imposed by allocated power budget or die area will curtail the reign of multiple core configurations as a means of increasing performance.

Most powerful 3D computer

Many computer architects like to learn from existing architectures. It was interesting therefore to see how the most powerful known 3D computer is working around its power limitations. It was however very difficult to find much data on the Internet. The data below was compiled from a few sources and the reader is asked to help corroborate it and/or provide more reliable numbers and sources:

An adult human brain is estimated to contain 1011 (100 Billion) neurons. A firing neuron consumes an average energy of 10-9 joules.  The neuron’s maximum firing rate is estimated by some papers to be 1,000Hz. Normal operating frequencies are lower at 300Hz to 400Hz.

The maximum power that would be generated by the human brain with all neurons firing at the maximum frequency of 1,000 Hz is 103 * 1011* 10-9 = 105 joule/second = 100,000 Watt — enough to destroy the brain and some of its surroundings.

Some papers estimate the actual power consumption of the brain at 10W while others peg it at 100W. According to still other papers the power averaged over 24 hours is 20W. Yet, even the highest number seems acceptable since the brain’s 3D structure is blood-and-evaporation cooled and kept at optimal temperature. Imagine keeping a 100W heat source cool by blood flow!  Performance-wise the 10W and 100W power estimates imply that the brain is delivering 1010 or 1011 neuron firings per second. Using the considerations applied to semiconductor die usage, the brain may be running at 0.01% or up to 0.1% of its neuron capacity possibly turning semi-“dark brain” sections fully “on” or partly “off” depending on workload. Compare these percentages with the much higher 9% utilization factor forecasted for 11nm silicon.

The highly dense silicon chip and the human brain are affected by the same laws of physics.

In semiconductor technology, as Moore’s law places more transistors on the same-sized die or makes the die smaller, the power budget needed for full transistor utilization moves in the opposite direction since it requires larger die areas.Unless cost-acceptable extreme cooling can track technology nodes by removing for example at 11nm about five times more heat from the reference die, or technology finds ways to reduce a cores’ power dissipation by the same factor, Moore’s Law and computing performance will be following different roadmaps.

In mobile applications the limit is affected by battery capacity vs. size and weight. According to some battery developers, capacity is improving slowly as vendors spend more effort in creating custom batteries for big suppliers of mobile systems — than in research. I’m estimating battery capacity to improve at approximately 6% per year, leaving Moore’s law without support since it doubles transistor numbers every two years.

UCSD/MIT’s approach is not a ‘waste’ of transistors if its use of dark silicon can deliver higher performance within the boundaries of the power budget.The Von Neumann architecture was built to save components since it was created at a time when components were expensive, bulky and hard to manufacture. Our problem today and in the near future is to conceive of an architecture that can use an affluence of components.

‘DRM’ For Systems: Protecting Data and Engineering Intellectual Property

Friday, November 19th, 2010 by Max Baron

Freescale Semiconductor has just launched a low cost chip that can be used to protect network-connected low power systems from unauthorized access to system-internal resources. Freescale’s target: a chip that can secure the growing number of network endpoints.

When it comes to e-books, images, music, TV episodes and movies, the authors’ and producer’s rights are protected by encryption. Encryption makes it impossible to easily and legally take on trips or vacations any literature or multimedia from the device to which these have been originally attached. Further, it makes it impossible to create important backups since optical, magnetic and flash memory media can lose part of their content more easily than books or film.

Priceless art in its different forms must be protected. If however we separate the unique talent and genius from the money and time invested, we find that DRM (Digital Rights Management) protects investments ranging from a few tens of thousands of dollars to the sometimes high cost of two-three hundred million dollars per movie (exception: the cost of Avatar was estimated at $500M).

Yet, nobody protects the brainchildren of system architects, software and hardware engineers and the investments and hard work that have produced the very systems that made feasible the art and the level of civilization we enjoy today. They are not protected by a DRM where “D” stands for “Designers.”  Separating priceless engineering genius and talent from investment, we find similar sums of money invested in hardware and software aside from the value of sensitive data in all its forms that can be stolen from unprotected systems.

Freescale Semiconductor’s recent introduction is adding two new members to the QorIQ product family (QorIQ is pronounced ‘coreIQ’). They are the company’s Trust Architecture-equipped QorIQ P1010 and the less featured QorIQ P1014. The QorIQ P1010 is designed to protect factory equipment, digital video recorders, low cost SOHO routers, network-attached storage and other applications that would otherwise present vulnerable network endpoints to copiers of system HW and SW intellectual property, data thieves and malefactors. 

It’s difficult to estimate the number of systems that have been penetrated analyzed and/or cloned by competitors or modified to offer easy access to data thieves, but some indirect losses that have been published can be used to understand the problem.

In its December 2008 issue, WIRED noted that according to the FBI, hackers and corrupt insiders have stolen since 2005 more than 140 million records from US banks and other companies, accounting each year for a loss of $67 billion. The loss was owed to several factors.  In a publication dated January 19, 2006, c-net discusses the results of an FBI-research involving 2,066 US organizations out of which 1,324 have suffered losses over a 12-month period from computer-security problems. Respondents spent nearly $12 million to deal with virus-type incidents, $3.2 million on theft, $2.8 million on financial fraud, and $2.7 million on network intrusions. The last number represents mostly system end-user loss since it’s difficult to estimate the annual damage to the system and software companies that have created the equipment. 

Freescale Semiconductor’s new chip offers a two-phase secure access to the internals of network-connected low cost systems. The first phase accepts passwords and checks the authorization of the requesting agent be it a person or machine. The second phase provides access to the system’s HW and SW internals if the correct passwords have been submitted.

Fabricated in 45nm SOI, Freescale Semiconductor’s QorIQ P1010 shown in the Figure is configured around an e500 core — a next generation core that’s downward code-compatible with the e300 core that populates the company’s PowerQUICC II Pro communications processors. QorIQ P1010’s Power Architecture e500 core is designed to be clocked at up to 800MHz and is estimated by Freescale Semiconductor to consume in some applications less than 1.1W.

The chip’s single e500 core configuration follows the same concept employed in Freescale Semiconductor’s higher performance QorIQ chip family where protected operating system and applications are executed by multiple e500 cores at higher frequencies. The common configuration has the family’s processing cores and supporting cache levels surrounded by application-specific high bandwidth peripherals that include communication with the network, system-local resources and system-external peripherals.

External peripherals such as cameras will be encountered by the QorIQ P1010 in digital video recorders (DVR) accepting analog video streams from surveillance cameras. The DVRs may employ local storage for locally digitized and encoded video and/or make use of a network to access higher capacity storage and additional processing.

FlexCAN interfaces are the most recent on-chip application-specific peripherals encountered in the QorIQ P1010. The chips’ architects and marketing experts are probably responding to requests coming from potential customers building factory equipment. Aside from e500 cores and peripherals, the denominator common to most chips in the family is the little documented Trust Architecture.

An approximate idea of the Trust Architecture’s internals and potential can be gleaned from a variety of documents and presentations made by Freescale Semiconductor, from ARM’s Cortex-A series of cores employing a similar approach under that company’s brand named “TrustZone,” and from today’s microcontroller technologies used in the protection of smart cards.

The basic components of a secure system must defend the system whether it’s connected to the network or not, under power or turned off, monitored externally or probed for activity.

In the QorIQ P1010’s configuration of the first phase we should expect to find encrypt-decrypt accelerators used in secure communication with off-chip resources and on-chip resident ROM. The content of these resources should be inaccessible to any means that don’t have access to the decryption keys.

Off-chip resources could include system-internal such as SDRAM, Flash memory, optical drives and hard drives. Examples of system-external resources can be desktops, network attached storage, and local or remote servers.

The on-chip resident ROM probably contains the code-encrypted security monitor and boot sequence. A set of security fuses may be used to provide the necessary encryption-decryption keys. Different encryption-decryption keys defined for different systems would tend to limit damage made by malefactors but in case of failure could render encrypted data useless unless the same keys could be used in a duplicate system. Keys implemented in 256 bits or higher will make it very difficult in time and money to break into the system.

The block diagram of the QorIQ P1010 shows the peripherals incorporated on-chip but understandably offers less information about the Trust Architecture protecting the SoC. The QorIQ P1014 should cost less and should be easier to export since it lacks the Trust Architecture and shows a more modest set of peripherals. (Courtesy of Freescale Semiconductor)

Initial system bring-up, system debug and upgrade processes require access to chip internals. According to Freescale Semiconductor the chip’s customers will be provided with a choice among three modes of protected access via JTAG: open access until the customer locks it, access locking w/o notification after a period allowing access for debug, and JTAG delivered as permanently closed w/o access.  Customers also have the option to access internals via the chip’s implementation of the Trust Architecture.

Tamper detection is one of the most important components in keeping a secure chip from unauthorized analysis. Smart cards detect tampering in various ways: they monitor voltage, die temperature, light, bus probing, and in some implementations, also employ a metal screen to protect the die from probing. We should assume that the system engineer can use the QorIQ P1010 to monitor external sensors through general purpose input/outputs protecting the system against tampering — to ensure that the system will behave as the intended by the original manufacturer.

 The QorIQ P1010’s usefulness depends on the system configuration using it.  A “uni-chip” system will be protected if it incorporates hardwired peripherals such as data converters and compressors, but it employs only the QorIQ P1010 for all its programmable processing functions. According to Freescale Semiconductor’s experts, a system’s data communications with other systems on the network will be protected if the other systems also employ the QorIQ P1010 or other members of the QorIQ chip family. Simple systems of this kind can use the QorIQ P1010 to protect valuable system software and stored data since except in proprietary custom designs the hardware may be easy to duplicate.

Note that systems employing the QorIQ P1010 plus additional programmable SoCs are more vulnerable.

Freescale has not yet announced the price of the QorIQ P1010. To gain market share the difference in cost of the QorIQ P1010 compared with a SoC lacking protection should be less than 2%-3% of the system cost–else competing equipment lacking protection will sell at lower prices.  Freescale Semiconductor has introduced a ready-to-use chip that can become important to end users and system designers. Now all we need to see is pricing.

The Express Traffic Lane (It’s Not the Computer, It’s How You Use It)

Friday, September 24th, 2010 by Max Baron

Less than a week ago, a section of the diamond lane in California’s southbound Interstate 680 freeway became sensor-controlled or camera-computerized. A diamond lane, for those of us not familiar with the term, is an express traffic lane allowed only to high-occupancy automobiles or types of vehicles that use environmentally friendly fuels or less gasoline.

Also known as the carpool lane, the diamond lane is usually marked by white rhombuses (diamonds) painted on the asphalt to warn solo drivers that they are not allowed to use it. The diamond lane provides fast free commuting for carpoolers, motorcyclists and diamond lane sticker owners. Solo drivers must use the remaining lanes that are usually slower during periods of peak traffic. These single drivers however, are now allowed to use a section of the diamond lane in California’s southbound Interstate 680 freeway — but they have to pay for it.

The camera-computerized or sensor-activated system introduced just a few days ago doesn’t make sense considering the state-of-art of available technology.

Here is how the complex system works. An automobile carrying only its driver must have a FasTrak transponder allowing a California-designated road authority to charge a fee for using this newly created toll-diamond lane. Mounted on a car’s windshield, the FasTrak transponder uses RFID technology to read the data required to subtract the passage fee from the car owner’s prepaid debit account. The fee reflects the traffic level and is changed according to the time of day. The fee is displayed on pole-mounted digital displays.

To avoid being charged if there are also passengers in the automobile, a FasTrak transponder owner must remove the vehicle’s transponder from the car’s windshield. Caught by traffic enforcement (California Highway Patrol) a solo driver without a FasTrak transponder is fined for using the diamond lane without paying for the privilege. Other schemes implemented for instance at a toll plaza, will trigger a camera to take a photo of the delinquent vehicle and its license plate following which a violation notice will be sent to the registered owner of the vehicle.

Considering the complexity of the system from the viewpoint of existing digital cameras, embedded computers, cellular telephony and the presence of police enforcement on the freeway, one can wonder about the necessity of FasTrak devices or police involvement. 

If we are to follow descriptions found on publications such as San Jose’s Mercury daily newspaper (reference) and the freeway’s information brochure (reference), the system seems to be unnecessarily disconnected: an RFID tag is used to pay for solo driving, but police has to check if a vehicle without a transponder is occupied by just the driver or by additional people. If a FasTrak-less driver is detected police must stop the delinquent car and write a ticket. Based on the description, it may seem that the cameras or sensors implemented are unable to differentiate among illegal solo drivers vs. multiple passenger cars. If true, these cameras or sensors are using technology that was state of the art in the 90’s.  They only seem to be able to detect a large object well-enough to report to police the number of vehicles using the lane without transponders, leaving the rest to law enforcement.

Today’s embedded computers equipped with simple cameras can read numbers and words. FasTrak transponders should not be required. Existing systems can identify human shapes and features in cars well enough to differentiate among multiple vs. solo drivers and with adequate software and illumination, they can continue to function correctly despite most adverse weather changes or light conditions. The word “CARPOOL” written by the driver of a multiple-person car can be displayed for the computer to read to ensure that the system will not charge for the use of the toll-lane. The license plate of the solo driver automobile can be linked in a data base to a debit account or to the name and address of the owner.

We estimate the price of a pole-mounted low-power system of this kind including wireless communication at a pessimistic $9,800 as follows: a ruggedized camera– $1,500; a video road image recognition engine plus software such as designed by Renesas for automobiles — $2,000 including software (reference); a controlling embedded computer including a real-time OS — $900; a wireless communication block — $600; components for remote testing, upgrades and servicing–$1,000; battery and power supply– $1,000; solar panels if required–$800; enclosure–$2,000.

In a modern system there would be no fines—just charges made to driver bank accounts if so elected–or monthly statements to be paid along with electrical, gas, and other services for which monthly payments have found acceptance. But, have we been told everything? Do we really know what types of systems are looking today at the traffic on freeway 680’s toll-enabled express lane? This may be just step one.

UCSD Turns On the Light on Dark Silicon

Friday, August 27th, 2010 by Max Baron

The session on SoCs at Hot Chips 22 featured only one academic paper among several presentations that combined technical detail with a smidgeon of marketing. Originating from a group of researchers from UCSD and MIT, the presentation titled “GreenDroid: A Mobile Application Processor for a Future of Dark Silicon,” introduced the researchers’ solution to the increase of dark silicon as the fabrication of chips evolves toward smaller semiconductor technology nodes.

The reference to dark silicon seems to have been picked up by the press when in 2009 Mike Muller ARM’s CTO, described the increasing limitations imposed by power consumption, on driving and utilizing the increasing numbers of transistors provided by technology nodes down to 11nm. As described by the media, Mike Muller’s warning spoke about power budgets that could not be increased to keep up with the escalating number of transistors provided by smaller geometries.

Why have power budgets? The word “budget” seems to imply permission that designers can increase power by an arbitrary setting of a higher budget. Carrying power increases to extreme levels however will generate temperatures that will destroy the chip or drastically reduce its lifetime. Thus, a fixed reference die whose power budget is almost fixed due to the die’s fixed dimensions will reach a semiconductor technology node where only a small percent of its Moore’s Law–predicted transistors can be driven. The remaining transistors are the dark silicon.

The solution presented at Hot Chips 22 by UCSD cannot increase the power budget of a SoC but it can employ more dark silicon that would otherwise remain unused. The basic idea was simplicity itself: instead of employing a large power-hungry processor that expends a lot of unnecessary energy in driving logic that may not be needed for a particular application–why not create a large number of very efficient small C-cores (UCSD term) that could execute very short sequences of the application code very efficiently?

Imagine a processor tile such as encountered in MIT’s original design that through further improvement became Tilera’s first tile-configured chip. UCSD is envisioning a similar partition using tiles but the tiles are different. The main and comparatively power-hungry processor of UCSD’s tile is still in place but now, surrounding the processor’s data cache, we see a number of special-purpose compiler-generated C-cores.

According to UCSD, these miniature Tensilica-like or ARC-like workload-optimized ISA cores can execute the short repetitive code common to a few applications more efficiently than the main processor. The main processor in UCSD’s tile – a MIPS engine – still needs to execute the program sequences that will not gain efficiency if they are migrated to C-cores. We don’t know whether the C-cores should be considered coprocessors to the main processor such as might be created by a Critical Blue approach, or slave processors.

UCSD’s presentation did not discuss the limitations imposed by data cache bandwidths on the number of C-cores that by design cannot communicate with one another and must use the cache to share operands and results of computations. Nor did the presentation discuss the performance degradation and delays related to loading instructions in each and every C-core or the expected contention on accessing off-chip memory. We would like to see these details made public after the researchers take the next step in their work.

UCSD did present many charts describing the dark silicon problem plus charts depicting an application of C-cores to Android. A benchmark comparison chart was used to illustrate that the C-core approach could show up to 18x better energy efficiency (13.7x on average). The chart would imply that one could run up to 18x more processing tiles on a dense chip that had large area of dark silicon ready for work, but the presentation did not investigate the resulting performance – we know that in most applications the relationships will not be linear.

I liked the result charts and the ideas but was worried that they were not carried out to the level of a complete SoC plus memory to help find the gotchas in the approach. I was disappointed to see that most of the slides presented by the university reminded me of marketing presentations made by the industry. The academic presentation reminded me once more that some universities are looking to obtain patents and trying to accumulate IP portfolios while their researchers may be positioning their ideas to obtain the next year’s sponsors and later, venture capital for a startup.

Who Will Make the Digital Health System?

Friday, August 13th, 2010 by Max Baron

Until a few days ago, I was asking myself how Intel would transfer results of its research to system manufacturers. I was wondering about the strategy Intel might use to turn the Digital Health system into a real product that could sell to tens of millions of people. Will the company become a paid system IP provider in addition to its semiconductor business, or will it offer the accumulated system expertise free of charge just to generate more sockets for its processors? Intel’s strategy could indicate to us one way in which the company might transfer ideas coming from its freshly announced Interaction and Experience Research (IXR) group to potential OEMs.

The Digital Health system has not been totally dormant. It has seen some adoption although mainly abroad, but compared with products such as the desktop PC, or even the relatively new netbook, Digital Health has remained practically a prototype.

But, to go back to my question, it’s not just about who will fabricate and market the Digital Health system. There are other important details to learn such as who will develop the hardware and software and what will be the business strategy needed to equip the system with medical peripherals that can be deployed at home? These questions have remained unanswered until a few days ago when we learned at least one of the many possible answers.

Intel will transfer the development and marketing of the Digital Health system to . . . Intel and GE or more precisely to a separate company jointly owned by the two partners.

The joint announcement by the two companies answered some of the original questions but left most of the details to be communicated at a later time. We know for instance that the two partners will each provide 50% of the funding for the joint venture and we can assume that they will share profits in the same way. We know that the partnership has created a fully owned company. The two partners have not yet selected a name for the company but they have communicated that Louis Burns, V.P. and general manager of Intel’s Digital Health Group, will be CEO of the new company, and Omar Ishrak, senior V.P. of GE and president and CEO of GE’s Healthcare Systems, will be the chairman of the board.

The partnership seems to be a perfect match. General Electric has developed and continues to design health-care monitoring instrumentation but it needs a processor system to provide the required user-patient interface and the communication to the doctor at the clinic. Intel’s Digital Health system can provide the user-interface, the control and the communications but, it needs the additional medical peripherals that can turn it into a complete health care system.

The new company has evolved from an earlier alliance announced in 2009 according to which Intel and GE would invest $250 million over five years in marketing and developing home-based health technologies targeting seniors living independently and patients with chronic conditions to help manage their care from the comfort of their home.

The 2009 alliance, which had GE Healthcare sell and market the Intel Health Guide–a care management tool– has thus blossomed into a new well-funded startup and for good reason: at the time, according to a GE press release, Datamonitor reported that the market for US and Europe telehealth care was predicted to grow from $3 billion in 2009 to an estimated $7.7 billion by 2012.

I’m optimistically translating the forecast to imply that in 2012 the market will see shipments of approximately 6.16 million units priced at an average of $1,250 in the US and Europe. The worldwide number could come close to 10 million processors if we think in terms of chips—that’s not much for the semiconductor giant that sells hundreds of millions of chips annually—but very promising for an OEM business.

If the rapid growth of the health care at home market materializes as predicted one year ago, it will offer opportunities for more semiconductor companies such as Analog devices, Freescale, Microchip, Renesas, STMicroelectronics, Texas Instruments, and many others that can provide inexpensive microcontrollers, hybrid MCU/DSPs, AD and DA converters plus semiconductor transducers such as accelerometers, gyros, and capacitive, resistive, temperature and pressure sensors (reference: Embedded Processing Directory). According to a Q&A session held a few days ago, Intel and GE’s still-to-be-named company will develop all of the needed technologies internally including all of the needed hardware and software—a statement that if taken literally may involve hundreds of employee experts in different technical disciplines. However, we should not be surprised if some of the related work is contracted out to other companies.

Having learned the answers to a few macro questions, we should ask questions at the next level of detail: will the new system employ a standard operating system and if true, how will it keep up with the innumerable updates – security and bug fixes—that we see coming over the Internet today – irrespective of the operating system software vendor?

How will it protect the security of health-sensitive information? Who will be creating and maintaining the software needed to run on the medical clinic servers – software that needs to communicate with the patient, access the patient’s database, interface with the doctor, and even make some simple decisions by itself?

Will non -GE or -Intel makers of medical peripherals be supported by appropriate hardware and software open standards allowing them to extend and improve the system? Once the opportunity in remote health care is proven, which other manufacturers and alliances already looking at it now will compete with Intel and GE? Will the cellphone take advantage of the opportunity? Will Intel and GE also create a system that involves a cellphone?

The Digital Health system is an important innovation in health care that can improve the quality of life for millions of users and reduce the cost of delivering it. Its implementation will also provide a lesson in complex system design that combines analog and digital embedded peripherals with advanced user interfaces, PC-like execution applications, and Wi-Fi and Internet communication with advanced software running on servers.

The Next Must-Have System

Friday, July 30th, 2010 by Max Baron

The 9th annual Research@Intel Day held at the Computer History Museum showcased more than 30 research projects demonstrating the company’s latest innovations in the areas of energy, cloud computing, user experience, transportation, and new platforms.

Intel CTO Justin Rattner made one of the most interesting announcements about the creation of a new research division called Interaction and Experience Research (IXR).

I believe IXR’s task will be to determine the nature of the next must-have system and the processors and interfaces that will make it successful.

According to Justin Rattner, you have to go beyond technology; better technology is no longer enough since individuals nowadays value a deeply personal, information experience. This suggests that Intel’s target is the individual, the person that could be a consumer and /or a corporate employee. But how do you find out what the individual that represents most of us will need beyond the systems and software already available today?

To try to hit that moving target, Intel has been building up its capabilities in the user experience and interaction areas since the late nineties. One of the achievements was the Digital Health system, now a separate business division. It started out as a research initiative in Intel Labs – formerly the “Corporate Technology Group” (CTG), with an objective of finding how technology could help in the health space.

Intel’s most recent effort has been to assemble the IXR research team consisting of both user interface technologists and social scientists. The IXR division is tasked to help define and create new user experiences and platforms in many areas some of which are end-use of television, automobiles, and signage. The new division will be led by Intel Fellow Genevieve Bell–a leading user-centered design advocate at Intel for more than ten years.

Genevieve Bell is a researcher. She was raised in Australia. She received her bachelor’s degree in Anthropology from Bryn Mawr College in 1990 and her master’s and doctorate degrees in Anthropology in 1993 and 1998 from Stanford University where she also was a lecturer in the Department of Anthropology. In her presentation, Ms. Bell explained that she and her team will be looking into the ways in which people use, re-use and resist new information and communication technologies.

To envision the next must have system, Intel’s IXR division is expected to create a bridge of research incorporating social research, design enabling and technology research. The team’s social science, design, and human-computer interaction researchers will continue the work that’s already been going on, by asking questions to find what people will value and what will fit into their lives. New systems, software, user interactions and changes in media content and consumption could emerge from using the obtained data on one hand, and the research into the next generation of user interfaces on the other.

Bell showed a photo of a child using his nose to key in data on his mobile—an example of a type of user-preferred interface that may seem strange, but it can be part of the data used by social scientists to define an innovation that may create the human I/Os for 2020. The example also brought out a different aspect that was not addressed: how do you obtain relevant data without placing in the hands of your population sample a scale model or an actual system to help review it, improve it or even reject it and start from scratch?

In addition to the very large investment Intel makes in US-based research it also owns labs, and it collaborates with or supports over 1,000 researchers worldwide. According to Intel, 80% of Intel Labs China focuses on embedded system research. Intel Labs Europe conducts research that spans the wide spectrum from nanotechnologies to cloud computing. Intel Labs Europe’s website shows research locations and collaboration in 17 sites – and the list doesn’t even include the recent announcement of the ExaScience Lab in Flanders, Belgium.

But Intel is not focused only on the long term. Two examples that speak of practical solutions for problems encountered today are the Digital Health system that can become a link between the patient at home and doctor at the clinic, and the connected vehicle (misinterpreted by some reporters as an airplane-like black box intended for automobiles).

In reality, according to Intel, the connected vehicle’s focus was on the personal and vehicle safety. For example, when an attempt is made to break into the vehicle, captured video can be viewed via the owner’s mobile device. Or, for personal safety and experience, a destination-aware connected system could track vehicle speed and location to provide real-time navigation based on information about other vehicles and detours in the immediate area.

Both life-improving systems need to find wider acceptance from the different groups of people that perceive their end-use in different ways.

Why is Intel researching such a wide horizon of disciplines and what if anything is still missing? One of the answers has been given to us by Justin Rattner himself.

I believe that Rattner’s comment “It’s no longer enough to have the best technology,” reflects the industry’s trend in microprocessors, cores, and systems. The SoC is increasingly taking the exact configuration of the OEM system that will employ it. SoCs are being designed to deliver precisely the price-performance needed at the lowest power for the workload—and for most buyers the workload has become a mix of general-purpose processing and multimedia, a mix in which the latter is dominant.

The microprocessor’s role can no longer be fixed or easily defined since the SoCs incorporating it can be configured in countless ways to fit systems. Heterogeneous chips execute code by means of a mix of general purpose processors, DSPs and hardwired accelerators. Homogeneous SoC configurations employ multiple identical cores that together can satisfy the performance needed. And, most processor architectures have not been spared; their ISAs are being extended and customized to fit target applications.

Special-purpose ISAs have emerged –trying and most of the time, succeeding in reducing power and silicon real-estate for specific applications. Processor ISA IP owners and enablers are helping SoC architects that want to customize their core configuration and ISA. A few examples include ARC (now Viraje Logic and possibly soon–Synopsis), Tensilica, and suppliers of FPGAs such as Altera and Xilinx. ARM and MIPS offer their own flavors of configurability. ARM is offering so many ISA enhancements available in different cores that aside from its basic compatibility, it can be considered as a “ready-to-program” application-specific ISA while MIPS leaves most of its allowed configurability to the SoC architect.

In view of the rapidly morphing scenario, the importance of advanced social research for Intel and, for that matter for anybody in the same business cannot be overstated. Although it may not be perceived as such, Intel has already designed processors to support specific applications.

The introduction of the simple, lower performance but power-efficient Atom core was intended to populate the mobile notebook and net book. The Moorestown platform brings processing closer to the needs of mobile low power systems while the still-experimental SCC –Intel’s Single-chip Cloud Computer is configured to best execute data searches in servers.

It’s also interesting to see what can be done with an SCC-like chip employed in tomorrow’s desktop.

If Intel’s focus is mainly on the processing platform as it may be, what seems to be missing and who is responsible for the rest? The must-have system of the future must run successful programs and user interface software. While Intel is funding global research that’s perceived to focus mostly on processors and systems–who is working on the end-use software? I don’t see the large software companies engaging in similar research by themselves or in cooperation with Intel’s or other chip and IP companies’ efforts. And we have not been told precisely how a new system incorporating hardware and software created by leading semiconductor and software scientists will be transferred from the draft board to system OEMs. An appropriate variant on Taiwan’s ITRI model comes to mind but only time will tell.