How should embedded systems handle battery failures?

Wednesday, November 30th, 2011 by Robert Cravotta

Batteries – increasingly we cannot live without them. We use batteries in more devices than ever before, especially as the trend to make a mobile version of everything continues its relentless advance. However, the investigation and events surrounding the battery fires for the Chevy Volt is yet another reminder that every engineering decision involves tradeoffs. In this case, damaged batteries, especially large ones, can cause fires. However, this is not the first time we have seen damaged battery related issues – remember the exploding cell phone batteries from a few years ago? Well that problem has not been completely licked as there are still reports of exploding cell phones even today (in Brazil).

These incidents remind me of when I worked on a battery charger and controller system for an aircraft. We put a large amount of effort into ensuring that the fifty plus pound battery could not and would not explode no matter what type of failures it might endure. We had to develop a range of algorithms to constantly monitor each cell of the battery and appropriately respond if anything improper started to occur with any of them. One additional constraint on our responses though was that the battery had to deliver power when it was demanded by the system despite parts of the battery being damaged or failing.

Even though keeping the battery operating as well as it can under all conditions represents an extreme operating condition, I do not believe it is all that extreme a condition when you realize that automobiles and possibly even cell phones sometimes demand similar levels of operation. I recall discussing the exploding batteries a number of years ago, and one comment was that the exploding batteries was a system level design concern rather than just a battery manufacturing issue – in most of the exploding phones cases at that time, the explosions were the consequence of improperly charging the battery at an earlier time. Adding intelligence to the battery to reject a charging load that was out of some specification was a system-level method of minimizing the opportunity to damage the batteries via improper charging.

Given the wide range of applications that batteries are finding use in, what design guidelines do you think embedded systems should follow to provide the safest operation of batteries despite the innumerable ways that they can be damaged or fail? Is disabling the system appropriate?

Food for thought on disabling the system is how CFL (compact fluorescent lights) handle end-of-life conditions for the bulbs when too much of the mercury has migrated to the other end of the lighting tube – they purposefully burn out a fuse so that the controller board is unusable. While this simple approach avoids operating a CFL beyond its safe range, it has caused much concern among the user population as more and more people are scared by the burning components in their lamp.

How should embedded systems handle battery failures? Is there a one size fits all approach or even a tiered approach to handling different types of failures so that users can confidently use their devices without fear of explosions and fire while knowing when there is a problem with the battery system and getting it fixed before it becomes a major problem?

Tags: , ,

19 Responses to “How should embedded systems handle battery failures?”

  1. B. @ LI says:

    Robert; is the question about handling battery failures, as they occur, or is it about preventing battery failures?

    It seems that, in many embedded systems, battery failure is a “Fatal” event, unless one wants to classify battery degredation as ‘failure’ …

    There is a constant trade-off of battery capacity, charge cycles, charge times, lifetimes, all of which have an effect on when, and how, the battery may fail.

    The other aspect is that many embedded systems are ‘mission-critical’, where failure results in signicant losses to the user.

    Lots to consider, and can we think about fully-disabling the system, disabling battery operation, using an alternate battery/energy source, graceful failure mode in the battery (e.g. reduce capacity during charge to reduce battery stress, or ???

    • Barry,

      The question is about both handling and preventing failures – this includes degraded operation and/or disabling the system when appropriate. I raise the question now because even though there is a reasonable difference in thought process by the people working on mission critical or man-rated systems versus consumer level designs, the automotive and cell phone battery fires suggest that the mission critical perspective might be trickling into the consumer design space. If batteries are able to significantly increase their energy storage density, I suspect embedded developers will need to understand batteries even more than they do today to keep their systems operating safely.

      Hopefully responses to this question will highlight how much designers are considering the failure/degrade modes of their batteries and in what types of designs.

  2. B. @ LI says:

    Hi Robert;
    Speaking from the handset design space, rather than ‘trickling into the consumer design space’, these issues have been a part of handset design (at least at my previous employer), for a long while.

    A lot of the protective mechanisms are incorporated into the ‘battery’, which has a small circuit board, typically on the contact end of the batter/cell, which houses temperature sensing, current limiting and low-voltage cut-off circuitry. The intent of most of the built-in circuitry is to protect, first, the user, next, the handset, and, finally, the cell itself ( I guess it’s like Asimov’s Three Rules).

    In addition, handsets often include charging protection, both over- and under-temperature, as well as ‘unauthorised’ or counterfeit charger types, which may have improperly-regulated voltages/currents.

    In normal use, the battery is protected against deep discharge by both software- and hardware-based under-voltage cut-off, typically at about 3.0-3.3 V in software, and a somewhat lower voltage in hardware. This protects the charge cycle life of the cell.

    Deep discharge of the cell may be further protected by the built-in circuit in the battery, which will disconnect the cell’s output when the cell is discharged to some voltage, usually below 2.5 V.

    Further areas of interest in battery failure are related to contact design, for both long-term reliability and protection against momentary disconnection during the high-G events often seen when your phone escapes your grasp ;-)

    Additionally, protection for memory and clock contents may be provided by short-term backup via capacitors or small ‘coin cells’, and/or by writing important data to NV-memory when a dangerously-low voltage is detected, before the CPU is shut down…

    Given the performance that consumers demand from handsets, today, it gets to be difficult to make the tradeoffs between maximum daily use-time available, vs. long-term lifetime… that said, I believe that the Energy Management people make operational safety their first priority, thus the inclusion of the most critical safety-related protection in the battery, itself, rather than a dependence on engineers/designers who do not have the depth of understanding of the electrical, mechanical and electrochemical characteristics and pitfalls.

  3. B. @ LI says:

    There must be _someone_ among our esteemed members who wants to comments?? There seems to be a lot of room for embedded algorithms and heuristics for analysing charge/discharge profiles, and using those (among other things) to adjust the way the battery is treated, to inform the user with accurate charge/discharge/ rates/states, usable operating time, battery lifetime …

    It may be left to battery manufacturers (well, maybe not) and charger/battery “gauge” makers to provide the basic information to put into this software, or, will we leave all of this to the semiconductor guys to embed in the charger ICs, and _not_ learn about battery care ??

  4. V. @ LI says:

    Barry, amazing comment and demonstration of real-life, hands-on experience and knowledge!

    I’m a software engineer, so I can speak only for the software side of things.

    The fundamental principle of concern separation dictates that each layer in the software+hardware stack takes on certain duties. Assumption of similar duties by adjacent layers (or even worse, non-adjacent) only introduces complexity and harms operation quality IMO.

    Another fundamental principle, that of high cohesion, can and must (and is, as far as I can see) applied to the layers of the software+hardware stack. That is, each layer should have a well-defined scope of operations it performs.

    With these principles in mind, I’d separate the issues this topic raised into three categories: best use of available energy, efficiency of battery charging and safety (keeping equipment within safe operating parameters).

    I think that the first two should be handled by software, which is easier to make able to make “knowledgeable” decisions by watching more parameters (available energy, user activity, processing performed, etc). Probably some combination of kernel-space and user-space logic should be able to do that, and I guess this is already done. I’d look in these parts of the stack for better overall performance.

    The third issue, which is also the most critical, should be handled by something as close as possible to the equipment itself, and this has to be some logic hard-wired to the adjacent hardware layer. The battery should be able to more-or-less protect itself from destruction. The phone should be able to handle some battery-related problems, like the momentary disconnection you mentioned.

    Just my 2c, I’m not an expert in embedded systems (far from it) and I hope I’m not boring you or, worse, talking nonsense! :)

  5. B. @ LI says:

    Vagelis;
    Good value at only 2¢ ! ;-)
    Yes, indeed, I agree with the appropriate ‘layering’ of the ‘tasks’ … As each level becomes more critical, then the reliability of the implementation becomes more critical. As it is difficult to achieve the required latency, speed, predictability and reliability in software for tasks such as short-circuit protection, these are best done in ‘provably-correct’ hardware, with rather short, and predictable, latency ;-)

    The evolution of charger and gauging ICs has moved quite a bit of ‘intelligence’ into a hardware implementation, but this also means that there can be a lot of information available, in ‘real-time’ for software to act upon… There is still, I believe, a lot of room for developing more complete software solutions to evaluate the ‘state of health’ of battery subsystems, which can alleviate lifetime degradation, and provide better prediction of impending end-of-life, or impending failures, at least in modes that are a bit more ‘graceful’ that short-circuits or massive internal failures !

    There are also mechanical effects in Li-ion cells, and there may even be some more room/methods for monitoring these changes, over cell lifetime, in order to allow more intelligent approaches to dealing with approaching end-of-life, including modified charging or energy use profiles.

    In accordance with your comments, some of the battery ‘control’ can even be moved quite close to the user (up the stack), to allow the user some choice, and control, over how the system may modify the charging/energy use of the particular equipment…

    But, then, I’m not a battery systems expert, merely someone who has had the privilege of working with some first-class, ‘real’ experts :-)

  6. V. @ LI says:

    I’m a Java guy, so, by definition, way up the hardware-software stack. Even with high-level platforms like Java though, it _is_ possible to control resource usage, like CPU and memory and thus affect energy consumption. Examples that come to mind are stopping animations in applications, lowering frame rates in games, using fewer threads or even only one (should affect more multi-core systems) etc.

    In the end, I think there is a clear conclusion in this discussion, which is that the software part of the stack has much room for improving its monitoring, controlling and “reacting” capabilities regarding power sources. Perhaps with some more support from the hardware layer, the OS could collect more data sooner, provide user-space software with power-related data (state, user-selected configuration and events) and thus have applications behave in a much smarter way, resulting potentially to a much higher efficiency in power usage.

    The latest devices, with much, much higher processing capability, can in my opinion get impressive gains.

  7. B. @ LI says:

    - – - Yes – - –

    :-}

    And, what says the estimable Mr. Cravotta ??

  8. F. @ LI says:

    Barry has given a good description of the issues involved with battery management, crash-avoidance, and system backup. I would just like to add that the crash-related problems are worse with alkaline batteries than with Li-ion or Li-poly. Alkaline batteries have a power profile that includes sudden, rapid loss of voltage near the end of capacity. It is hard to detect the end of capacity with an alkaline battery. Add to this the condition that most alkaline batteries are removable and are just wedged in between spring contacts, and it now becomes possible to lose power by bumping the system hard. With alkaline batteries, the system has to make itself impervious to crashing at any time unexpectedly. Ether that, or it must simply suffer the consequences of a crash.

  9. B. @ LI says:

    Thanks very much for the comments on Alkaline… Can you point me/us at some solid information on alkaline cell performance & characteristics?

  10. M. @ LI says:

    The best place to get information on good quality alakaline batteries is on the manufacturers website, eg http://www.duracell.com (follow the B2B trail). Or you can get to the same information via distributors like Farnell.
    I don’t fully agree with Franz about the sudden loss of voltage – take a look at the Duracell data for standard alakaline AA at
    http://professional.duracell.com/downloads/datasheets/product/Simply/Simply_AA_MN1500.pdf

    In my experience the big problem is that even f you provide a good quality battery holder and electronics that works correctly down to 0.8V per cell the end user then defeats all your efforts by using rubbish batteries from some no-name source. These are not backed up by any specification and may very well fail as Franz describes.

  11. B. @ LI says:

    @Michael; thanks for that link… one thing I noticed, immediately, is that the discharge graph is not shown in modes that are more useful to modern designers: instead of simple fixed-resistance discharge-volatge curves, they should be showing energy-available discharge curves, or constant-power discharge curves, both discharge time/voltage and discharge-time/remaining energy or percentage life.

    Many loads that are in use today are likely to be constant-power; even LED flashlights, typically, have a DC/DC converter to deliver constant power to the LED over the discharge voltage of the cell/battery pack.

  12. S. @ LI says:

    And many loads are anything but constant power – if we consider cell phones, laptops, tablets or… hybrid vehicles. Problems are far more complex and I know enough stories of apparently straightforward designs that went wrong from various reasons. I will return with one at another time. I will stop now at the discharge problem.

    The rapid discharge phenomenon is rather visible in the case of NiMH or NiCad chemistry. The following story is just an example — this is when I became aware of it.

    I have a relatively old and cheap digital radio (~15 years old design) that was probably designed for alkaline batteries; to change the batteries you have to turn off the radio (digital on/off that put the electronics in standby) and rely on an internal capacitor capable to maintain the settings for about 1…2 minutes (depending on the battery voltage at the replacement time).

    Because alkaline batteries have lower capacity than modern NiMH rechargeables, I switched to NiMH but with a “side effect”. When the battery is discharged enough to make the radio barely usable on AM and unusable on FM, I cannot turn it off (put it in standby) anymore and I cannot change the batteries without losing all the settings because the internal capacitor gets drained quickly by the still active electronics.

    It is obvious that the designer never had in mind such a sudden voltage drop (usually from 1.2V to 0.9V in 10…15 seconds) and the controller that handles the user keys (including the on/off button) does not respond to any user actions anymore. So, I end up reprogramming the settings after changing the batteries (i.e. setting the clock/alarm and 15 stations).

    So, as designers we need to anticipate far more than our educated minds tell us in the first place. Users are always unpredictable and very inventive. Conditions may vary a lot as well in patterns that may never be anticipated enough (see the Chevy Volt case).

    Like most of the current software that we have to deal with every day (see patches, updates, upgrades, etc) – perfection is an almost unachievable goal. Designing embedded systems, especially those used in safety critical applications, becomes a daunting task that very few are willing to appreciate. It is probably one of the most difficult jobs in the high-tech industry – and I’m not saying this because I am an Embedded Engineer.

    The battery management problem is just another issue that we have to deal with. A really challenging one…

  13. S. @ LI says:

    I’m not an expert in batteries and I don’t work in such industry, but I’ve been curious and I also had to implement some power management algorithms in my projects.
    Here are some resources that I found useful:

    http://batteryuniversity.com/

    The guy who keeps this site started his own company – Cadex- years ago. The company produces battery analyzers, chargers and testers.

    I would also recommend “Linden’s Handbook of Batteries” – the 4th edition was released in 2010 (see http://www.amazon.com/Lindens-Handbook-Batteries-Thomas-Reddy/dp/007162421X ).

    I hope it may help someone.

  14. B. @ LI says:

    Hi Stefan!
    RE: constant-power laod curves: I wasn’t trying to intimate that most loads are constant-power, but that constant-resistance loads are not at all common, and other types of battery discharge information presentation can yield useful, indeed critical, insight… anyway…

    It’s a nice point about a system being so tightly-designed around a particular energy cureve that it can’t handle an ‘unforeseen’ battery chemistry (or quality) being used, especially if the batteries are replaceable, standard-sized cells like AA/AAA, et al … I would have hoped that under-voltage detect would be used, but seems that someone forgot that aspect in your radio; maybe saved 3 cents ;-)

    In cell phones, the volumes are so high, the cells are built-to-fit, and every milli-Joule needs to be dragged out, plus safety concerns, plus user expectation, that we really design a lot of the circuitry to maximise energy use in just that narrow band of specifications… and, now, with digital interfacing being built into the cells, it’s possible to get quite intimate knowledge of a cell that is plugged in. That, again, goes to the earlier comments on putting protection and monitoring at the ‘right’ layer of the hardware/software stack.

  15. S. @ LI says:

    Hi Barry,

    It was your name attached to this topic that triggered my attention. And I couldn’t resist. I have so many stories related to batteries. My interest grew significantly after I purchased a hybrid car in 2007. And, yes, there is a story attached to this car and… some frustrations. As I said, I will return with details because my story has something to do (indirectly) with our own jobs. But I need more time to order the facts and put it in the context.

    I agree that batteries are “smarter” every day. There is a danger here as well: being smart means being more complex and being complex means being more exposed to “soft”(ware) failures. Are we relying too much on the intelligence we embed into all these devices? Is this slowing the penetration of other technologies that may not require sophistication to solve the same problem?

    When I asked this question I had in mind MRAM vs flash memory: despite of all its qualities, MRAM has not been widely adopted by the market but companies rather spend a lot of resources and effort on error correction and wear leveling algorithms (see the latest news regarding Anobit that developed very complex algorithms that would allow unreliable MLC NAND to be used, well… more reliably!!!). Aren’t we in the same situation with Lion batteries? Maybe we just lost the ability to see (and to search for) more fundamentally simpler solutions…

    At 3 am in the morning I’m becoming too emotional :-)

  16. B. @ LI says:

    I think in the embedded system batteries has to be treated as a black box providing energy with the indication to the software with substantial accuracy when energy supply is going to end, sort of progress bar. This will give software the way to do its own maintanence, optimization and learning how to stay longer. The complexity of a battery implementation should be completely confined to the battery. The system which uses it should have no knowldege other than when the energy will end exactly.

  17. B. @ LI says:

    @Barry Kogan: I think that the approach you suggest is usable in embedded systems that have no user interaction at all, and no control from other devices, BUT, any system can use more significant battery information to extend the battery lifetime.

    From the cost perspective, embedding more intelligence in the battery pack moves a cost factor out of the BOM control of the embedded system developer, so that all systems would need to pay, or there would be an explosion of battery types, each with there own level of battery control/protection… As I stated previously, it makes a lot of sense to have protection included in the battery/cell against catastrophic failure modes, but I believe that it makes more sense to have ‘gas gauges’ and lifetime estimates done in higher-level software and hardware… The world is open to many opinions :-)

  18. A. @ LI says:

    Money talks – the economics of production would say that if battery is not replaceable then one can design all protection and capacity estimation using model of layers and hid all implementation in an expensive rechargeable battery. However, if the battery is relatively cheap and disposable then it needs to be relatively inexpensive (low device service cost) and it does make sense to keep the logic with he embedded system that consumes the energy rather than to throw away another small embedded system each time you replace a battery. Of course, protection against critical failure (explosion) should be included with the battery because it is safety critical.

    On the side comment I haven’t seen a “gas” car battery with such protection and one such car battery did explode on me some twenty years ago. Fortunately, without causing permanent damage to me at that time. /* I know, car as a whole is not an embedded system. */

Leave a Reply to A. @ LI