Do you ever think about endianess?

Wednesday, February 8th, 2012 by Robert Cravotta

I remember when I first learned about this thing called endianess as it pertains to ordering higher and lower order bits for data that consumes more than a single byte of data. The two most common ordering schemes were big and little endian. Big endian stored the most significant bytes ahead of the least significant bytes; little endian stored data in the opposite order with the least significant bytes ahead of the most significant bytes. The times when I was most aware of endianess was when we were defining data communication streams (telemetry data in my case) that transferred data from one system to another that did not use the same type of processors. The other context where knowing endianess mattered was when the program needed to perform bitwise operations on data structures (usually for execution efficiency purposes).

If what I hear from semiconductor and software development tool providers is correct, only a very small minority of developers deal with assembly language anymore. Additionally, I suspect that most designers are not involved in driver development anymore either. With the abstractions that compiled languages and standard drivers offer, does endianess affect how software developers write their code? In other words, are you working with datatypes that abstract how the data is stored and used, or are you implementing functions in such a way that require you to know how your data is internally implemented? Have software development tools successfully abstracted this concept away from most developers?

Tags: , , ,

140 Responses to “Do you ever think about endianess?”

  1. Dan says:

    Almost without exception, if you need to worry about endianness, as the saying goes, “you’re doing it wrong”.

    The typical, common and legitimate exception to this is data communications, where packets / frames are serialized & re-assembled, but this cat has been skinned many times, and there are always macros or functions to put data in “network byte order”, so in theory the programmer is relieved of these worries.

    When casting pointers from one data type to another of a different size, endianness is a concern, but this is almost always a ticking timebomb. As with everything, there are exceptions, I’ve done it myself, but generally speaking, data should be accessed (read) the same way it was written.

    When you talk about bitwise operations, I’m a little confused. If I want to set the upper bit of a 32-bit integer, the compiler & CPU will figure out which bit to turn off when I say something like “foo |= 1UL << 31" – I don't have to worry about the endianness. Even if I'm working in assembly. Endianness only matters when considering the layout of data in memory. Once data is loaded by the processor to be operated on, endianness is irrelevant. Shifts, bitwise operations, etc. just "do the right thing" regardless of endianness. (Unless, again as I said, you're writing things as a string of bytes, and then you're trying to interpret them as larger "chunks" (or vice versa).) Can you elaborate please?

    • Dan,

      I was referring to data that exceeds the word length of the processor – so using your words – I would expect endianess to be a concern when operating within a data structure that represent larger “chunks” of memory than the natural word length.

  2. Govinda Padaki says:

    Endianness becomes irrelevant outside of Assembly language (or low-level C/C++ programming design). Most compilers handle this by themselves and make it trivial to programmers.

    At one point in time, the endianness mattered to optimize program execution speed. Today, with a wide variety of processors, families, and supporting peripheral chips, endianness is no more important for programmers (from optimization point of view).

    When I asked a recent college graduate about endianness, he was blank… It is like asking “What is a resistor and the color codes?”!! Long ago, with the advent of SMT/SMD the concept of ‘Color Coded Resistors’ has vanished (I can’t say it is ‘extinct’… but people don’t talk about it anymore).

  3. D.A. @ LI says:

    You become quite aware of “endianess” in communication protocols, where you must send 16-bit ints and 32-bit floats byte-by-byte.

  4. J.K. @ LI says:

    I agree — most of the time you don’t worry about it. But, if you have a processor that can run either big-endian or little-endian, the OS maintainers and the compiler folks have to make sure they get it exactly right. And the application people have to make sure their code runs both ways. Likewise, if you’re communicating with an opposite endian machine you can have issues. I’ve done this twice from the OS/tool chain side. It makes your head spin.

    See: ON HOLY WARS AND A PLEA FOR PEACE by Danny Cohen:
    http://www.ietf.org/rfc/ien/ien137.txt

    if you want a thorough discussion of endian issues.

  5. J.K. @ LI says:

    One more issue: the I/O hardware is often wired for one endianness no matter which way the processor is running. Files written to disk or data sent over the wire may not be what you expect. Often the only way to tell is to experiment unless you have really good hardware folks around or exceptionally clear documentation.

  6. J.K. @ LI says:

    Last comment: C/C++ compilers will lay out bit fields in structs differently according to the endianness of the target executable (and the preferences of the tools folks). Mixing bit fields with bit-wise operators isn’t always intuitive when you switch ends.

  7. S.D. @ LI says:

    I work with a company that gets both kinds of boards. We take care of endianness in our ‘C’ code. I agree the OS/kernel should take care. But still at some places we require it for external communication

  8. D.H. @ LI says:

    Within a single machine, endianness doesn’t make too much difference any more. Talking to external devices (like a graphics card or communication stack) is typically where knowing endianness comes into play.

  9. A.P. @ LI says:

    Ever? Yes, definitely. Often? It depends on what I am doing. As has been said, if you work with comms protocols or I/O hardware, you have to be aware of it. There are probably more people out there writing these kinds of drivers than you think.

  10. C.R. @ LI says:

    @Jeff Kenton said: “Last comment: C/C++ compilers will lay out bit fields in structs differently according to the endianness of the target executable (and the preferences of the tools folks). Mixing bit fields with bit-wise operators isn’t always intuitive when you switch ends.”

    The C & C++ standards do not specify the bit order of bit fields and there is no preference in the bit order in regards to any endianness. This is entirely left open to the compiler implementation (I presume this to be the tools folks you might be talking about). I find this to be one of the failures of the standards committees.

  11. V.M. @ LI says:

    writing binary numeric data to files and non-volatile memory in embedded systems can get you in trouble. endian-ness as well as structure padding are two problems to watch for.

  12. RVDW @ LI says:

    It comes up fairly often for me, in loaders for multiprocessors, some network code, and register-level drivers. II think little-endian is less expensive and works in more types of hardware.

  13. VAH @ LI says:

    Unfortunately, byte swapping comes up far too often — sure wish the world would settle on a standard and eliminate this nuisance!!

  14. M.S. @ LI says:

    Like Jeff and Dave said, when you are inside a system there is no problem: C language inside a system is autoconsistent.
    You meet some problems in communication or multisystem interface.
    I have a C2000 dsp/microcontroller interfaced with a Coldfire with opposite endianess by a 16 bit dual port RAM: when I have to transfer 32 bit data I use a function interface swapping the word instead of the normal write/read: that’s is really so annoying !

  15. JDR @ LI says:

    I think that precisely in the embedded world, endianness often matters. That’s definitely my experience. For example, regarding the C2000 dsp mentioned by Maurizio, its lowest addressable unit is 16 bits, so a byte or a char is 16 bits. Thus, these 16 bits are stored in memory as if they were two bytes in big endian order. However, for 32-bit ints it uses little endian convention, storing least significant 16 bits first. It is a nightmare! This illustrates very well the problem of porting software between architectures.
    And I wish to add one more case to what have been said, that is when you are developing mathematics software, for example with big numbers.

  16. M.S. @ LI says:

    Correct Julio, it is an endianess word-structured … a nightmare !
    Furthermore, in c2000 I found the only case I know where sizeof() return the number of word instead of the number of bytes, according to 16 bit architectural structure of that dsp/microcontroller family.

  17. J.K. @ LI says:

    Maurizio: that sounds like a legal C implementation — it’s just unusual. If a “byte” is 16 bits, that’s what you would expect. You think of it as a word, but the compiler writers have to treat 16 bits as a byte.

  18. J.C. @ LI says:

    The endianess situation came about due to the two big CPU manufacturer, at that time, Intel and Motorola. Motorola use big endian (High Byte First) and Intel little endian. SInce intel was adapted as the PC cpu, all PC hardware in this platform by default were little endian. Motorola on other hand continued with integrating their cpu into stand-alone hardware which today has become none as “Emdedded Systems”.

    So in general all PC products (expansion boards, etc) use the little endian and embedded systems use Big endian. So many protocols(TCP/IP, Image Protocols) have made calls to convert the data. So if you are communicating between a PC and a embedded system expect to do some byte swapping some where. Also, keep in mind that some embedded systems have byte swapping built into interface so any byte swapping done at other end maybe cancel out.

  19. G.L. @ LI says:

    The whole endianness debate was launched by an Israeli
    engineer named Danny Cohen in the October, 1981, issue of
    IEEE _Computer_ magazine:
    - http://facstaff.bloomu.edu/bobmon/readings/ien137.Cohen-Holy_Wars.html

    I recall reading it and thinking, “Ah–yes, that’s what been
    keeping me confused all these years.”

    He described the problem in exquisite detail, and early
    enough in the evolution of the technology to avert a chaos
    of incompatible transmission orders. But his arguments fell
    largely on deaf ears.

    A lot of people think endianness is a scalar attribute but
    life is not that simple. You can talk about the endianness
    a particular architecture assigns to:
    - Bits within a byte,
    - Bytes within a word, or
    - Words within the whole address space.

    As Cohen presciently points out in his treatise, the IBM-360
    was the only processor to make all three of these endianness
    components the same. Every other machine in history
    (including all microprocessors), is a bastardization of both
    types: what you would expect from a committee.

    As other authors here have pointed out, this matters most
    when you are attempting to send data over a serial channel
    between processors who do not agree.

    I am right now attempting to get two such, endian-
    incompatible processors to communicate with each other over
    a CAN bus. We’ve had 30+ years to heed that warning, but I
    had to work out the byte order by hand over a couple of
    iterations before the bytes came out in the right order in
    both directions. Worse yet, if we ever switch one of the
    processors, someone will have to repeat my effort all over
    again.

  20. J.D. @ LI says:

    @John: It is true that many embedded systems processors derive from Motorola, but they ar far from defining the standard for all embedded processors. A large portion of manufacturers today license 32bit cores from ARM. The default endianness of ARM cores is little endian, although the implementer can choose to implement a big-endian system.

    For 8bit processors ans microcontrollers, this is actually a compiler choice, rather than a hardware aspect.

    In industrial automation systems, you have to be very aware of PLC endianness, if you manufacture devices that can communicate via different fieldbus protocols. The encoding of words and doublewords in data packets can differ widely. You can have word-little and dword-big endian, full-little, full-big, and word-big with dword little endian. It is so confusing that we have a “data template” command in our systems, that contains a record with (4 independent bytes, 2 16bit integers, one 32bit integer and one 32floating point) record for the customer to test how his/her PLC will interpret the data.

    Almost any streaming protocol that has data objects larger than 8bits will require endian awareness. There are some exceptions to this, like Modbus-RTU, that specifies the endianness of 16bit words.

    However, even when writing assembly language, you can have endian-agnostic software.
    Endianness issues only arise when you store data in a different width than you process the data. For example, if you only store 32bit words, and have assembly code acessing bitfields on those words, it is completely endian-insensitive, and will work regardless of the processor endian choice. Cortex-R4 cores, for example, can be configured at run-time for big or little endian memory systems. You can have code that will work correctly if booted as a big or little endian, just be consistent with the data usage.

    Despite the C language standard do not specify endianness, you can write C code that is completely endian-insensitive also. The problems arise when you assume some object representation and access types under the hood, like using unions or pointer casts to get the chars under floats or ints.

    But to answer the thread’s question: YES, always be concerned with endianness issues.

    - Jonny

    • Jeet says:

      Hi, you wrote: Despite the C language standard do not specify endianness, you can write C code that is completely endian-insensitive also. The problems arise when you assume some object representation and access types under the hood, like using unions or pointer casts to get the chars under floats or ints.

      What are some techniques to deal with this issue as I am currently trying to write a communications module using Modbus. If the struct containing ints, floats etc is typecasted to chars for Tx/Rx, how to dynamically detect a variable change and thus a required byte swap?

      Thanks!
      Jeet

  21. JDR @ LI says:

    Sincerely, I don’t share the opinion about having issues because you are communicating different endian-sensitive processors. Yes, they behave different, but you can always agree on using one convention, independently of specific processors.
    This is what is done in networking, with big endian for the network byte order. I always design communications,etc thinking in one convention (I like big endian) and without considering final architectures; I don’t mind if they are both little/big endian or not. The fact is that there exist different architectures and your code, sooner or later, would face that situation.

  22. M.G. @ LI says:

    I primarily write embedded software for big-endian processors. Big/little-endian issues have cropped up in two major areas:
    1) Communicating with PCI devices (which require little-endian data).
    2) Writing tools that share some of the source code from the embedded project, but run on an Intel target.

    In particular with the tools issue, what caused our problem was not the endian issue, but the fact that we had used bitfields for some of our record definitions. They caused enough grief for our tools that we removed the bitfields in favor of hex constants.

  23. J.D. @ LI says:

    @Michael: use of bitfields across different compilers can be a sorce of grief.
    I use bitfields extensively, but never as an interface between systems.

    Use of explicit bit position with e.g. (m <> n) & 0x0000000f) is more portable. But then you have to use bitfields in byte containers, or be aware of the endianness of the 2 systems.

    - Jonny

  24. M.G. @ LI says:

    @Jonny – The code in question was originally written only for our big-endian product. As soon as we tried to reuse that code in the tools, we switched it to use the more portable bit positioning. A lesson-learned in planning ahead for reuse.

  25. J.D. @ LI says:

    @Michael: Yes, that is a lesson learned long ago also. We only use streams of unsigned chars, mapped as words and dwords using a known mapping at the byte footprints. And all bitfields use bit shift positioning.

    - Jonny

  26. C.L. @ LI says:

    When we work on the drivers and getting data into and out of the processors, we have to deal with endianness. I’ve been there and it sounds like we all have been there. However, I also work on higher level systems design. In some systems, we need higher end processors, in other little ones like the MSP430. I try to pull along the same high level algorithms if possible (filters, control loops, etc.). I would like to abstract the endianness as much as possible so that the algorithms remain the same. Are you able to have much luck in this (hiding the endiannes inside the OS or drivers and keeping the aglorithms the same).

  27. J.D. @ LI says:

    @Chris:
    >> “Are you able to have much luck in this (hiding the endiannes inside the OS or drivers and keeping the aglorithms the same).”

    - was this a question? It looks like one. My answer: YES, we write endianness-insensitive algorithms as much as possible. The key is to declare and use the types without any tricks on the data accesses, like pointer casting. For example, do not access an array of short with a pointer to int.
    It is possible to keep the endianness concerns localized at the communications boundaries, and file I/O boundaries. Which is a lot already.

    - Jonny

  28. Anonymous says:

    @Jonny
    Thanks. It was a question, I forgot the question mark. It’s something that I’ve tried to do also. Usually it works, but I was wondering if I was missing some other pitfalls that I hadn’t come across. I find making the argument of separation to people difficult at times. Sometimes, it seems that people have written entangled code so much, that they just assume that there is no other way. I have more experience abstracting Algorithms from each other. When faced with endianness one time, making the argument that abstracting the endianness is possible was more difficult for me. I lacked the same confidence with that argument due to less experience. Thanks for sharing.

    - Chris

  29. G.C. @ LI says:

    To the OP: While it is possible to have many opinions regarding the relative annoyance of dealing with endian-ness, and many measures of how often individual engineers have occasion to deal with differing-endian system interactions, there is really only one absolute measure of “how often do you think about endian-ness?” That is, “every time I write an external interface — to another device, a network, a file system, etc.– but at no other time whatsoever.”

    Seriously, if you are writing software for your own hobby-project or as a quick-and-dirty test fixture, you can write it any way you want. But if you are writing software for use in a product, you should be thinking about endian-ness every time that product communicates multi-byte numeric values or bit-field values to any entity outside the device/sub-system in question — but never at any other time. This is why a decent design has an “interface layer” — to limit the scope of such issues to the first layer receiving such data from an external entity (parsing a file, receiving a serial stream, receiving network packets, receiving data on some arbitrary bus, etc.) and to the last layer before transmitting such data to an external entity (writing a file, sending serial/network/bus data). Once these layers have transposed the external data to/from internal native formats, the other 95% of the code can operate blissfully ignorant of such things as endian-ness — bit-wise or byte-wise.

    Now, if you are smart, you will take a lesson from the folks who standardized networking protocols and choose big-endian format for all such “inter-processor” data (either “live” inter-process communications via wired/wireless links or “indirect” IPC via files). In fact, I suggest such a choice even if you are designing a system that is today built around two little-endian x86 devices (unless processing speed is at a premium high enough to make it not work with the additional endian reversals). The reason I say this is because TOMORROW you never know what kind of platform might be chosen to host one or another part of your simple homogeneous system. It might end up as a big-endian ColdFire talking to a little-endian x86 before you’re finished.

    Why use big-endian? A few reasons. First: one day, your RS-485 connection might be replaced by a TCP/IP connection, and why rewrite more than you have to? Second: if you have to choose an arbitrary standard, why not pick one that’s second-nature to so many other engineers? Finally, there are those wonderfully simple, efficient, and widely-available function/macros, “ntohs()”, “ntohl()”, “htons()”, and “htonl()” that come for free with your favorite C/C++ compiler and auto-magically convert multi-byte integers from platform-endian to big-endian format (i.e., if your platform is already big-endian, they result in NO code being generated; if its little-endian, they generate the appropriate byte-reversal code as required) — why not just use them in all your interface-layer code?

    Of course, the bit-field-endian-ness issue is slightly less straight-forward, but frankly no more difficult: just pack/un-pack the bits as necessary to transpose from the external protocol definition to/from your favorite idiosyncratic internal bit-field formats and forget about the issue everywhere else.

    Unless you are writing code for a processor that has just enough CPU bandwidth to handle its required functions and no more, is there really any good reason NOT to provide such a common inter-processor big-endian (i.e., network-endian) format on every system even when it isn’t an up-front requirement? On the x86-to-x86 example I gave above, both devices would “needlessly” convert to/from big-endian for the “link” (real or virtual) between them, but why not build your system prepared for future revisions rather than write it to work until the next guy tries to port it?

  30. L.A. @ LI says:

    I had to fight with this problem a few times, and in most cases I implemented swap operation somewhere in the system. Could you give us some hints how to write more independent code (especially in C) of this problem? One of idea is to use shift operators when data are sent from one system to the other. Endianness is completely irrelevant for shift operators.

    And where the swap operation should be placed? In my opinion the best approach is to implement it on the lowest layer.

  31. B.C. @ LI says:

    I work in the process automation industry. I work on the embedded side with a variation of big and little endian platforms that needs to transmit data between 8, 16, and 32 bit embedded targets and to/from the Windows platform (x86). We always have to think about endianess. We try to convert data to a common endianness when transmitted on the network (say ethernet) and then when a device or PC receives the data it will be able to convert the data to it’s own endianness without needing to know where it came from.

  32. J.F. @ LI says:

    Any compiler that I have used lays out structures differently depending on the endianess. In Big endian compilations one starts defining a structure at the highest address, as is the case with all big endian definitions, at least as far as I am aware. Little endian, on the other hand defines structures starting at the lowest address. Now if one is definiing a structure that mirrors a register bank in your microprocessor, I do not know of any way to do that in a way that works for both endians.

    Also one of the comments above stated that all of the micros did not implement little or big endian completely. I have used Intel products for years and once you get your head into little endian, it seems very natural and I do not know where the implementation is “incomplete”. The nicety of little endian, and I do favor it, is that bit numbers represent the corresponding power of 2, bytes in a similar manner. Dump routines can be written to do the shuffle to assist in debugging if you have trouble byte swapping in your head.

  33. S.S. @ LI says:

    If performance is not an issue, use ASCII, or these days, UTF-8. In most cases the latency and cpu overhead of the I/O dominates the translation.

    If you fee binary data is required, write marshaling and un-marshalling functions to take a simple byte stream and generate the structures you need.

    I am constaltly amazed by the number of people who overlay a struct pointer (in c) over a protocol buffer and then byteswap the members of the structure. The C compiler is not required to pack structures, and though many compilers can be forced to do this it is silly to rely on it. The argument that if the local byte order is “correct” then it will be more efficent is also foolish, again I/O is the limiting factor.

    Oh dear, have I started ranting again…

    -steve

  34. C.R. @ LI says:

    @Jim Fisher. Point of clarification. Structs should be mostly laid out the same regardless of endianness or stack direction. From my experience, all data types (including arrays and structures) are always start at the low address. It is the order of the bytes inside of the multi-byte POD types (Plain Old Data or native data types like: ints or floats) where the differences happen.

  35. J.F. @ LI says:

    As I recall, when laying out structures for ARINC data words (each 32 bits for the entire structure, with the structure packed), with little endian, one begins with the label. The label is the last item in bIg endian. How do I do this so it works in both endians?

  36. J.F. @ LI says:

    To Steve Simon. Good morning Sir. The issue for me is not IO speed or execution speed or memory consumption. As estimates on embedded software maintenance costs are 70% and up, the issue is understandability and ease of making changes. For example, I think that defining ARINC structures and then accessing each field by name is easier to understand than masking and shifting. And if a C or C++ compiler is not capable of packing structures so I can match the hardware, my recommendation is to replace the compiler.

  37. S.S. @ LI says:

    Maintence is exactly the point, if you write marshal and unmarshal code from a byte stream into. Astructure in the local cpus structure then you write the code once for big and little edian cpus.

  38. C.S. @ LI says:

    Generally, the compiler will handle endianess within the same environment. It is when you cross the boundaries between systems that you have to be concerned. In our product line, it is possible to have the protocol stack on one system and the application on another system. When they communicate cross shared memory or dual-port RAM, you have to really be careful.

    As pointed out by others, sometimes the protocols themselves must be handled carefully because they incorporate data types greater than bytes. Identification is the key there and it can certainly be a nightmare to do such and keep the protocol intact.

  39. R.C. @ LI says:

    The endian debate existed before 1981. I remember dealing with it in the 70′s because IBM was big endian and Intel was small endian. (Endianess matters with 8 bit processors because you have 16 bit operations.) I was at a company that make IBM compatible equipment on 8080′s.

    ARMs can be big endian, but you get a speed penalty for using that feature. I think it’s one cycle for every operation other than register to register operations.

  40. G.L. @ LI says:

    Jim Fisher wrote:
    >
    > Also one of the comments above stated that all of the micros
    > did not implement little or big endian completely.
    >
    I wrote that and I’ll try to illustrate with an example from
    the paper I cited.

    In the VAX mini-computer there was a type called packed
    decimal, in which two, 4-bit binary-coded decimal digits
    were stored in a single byte. These were stored with the
    most-significant nibble in the lower address (big-endian)
    whereas bytes within a word and words within a long word
    were little endian. So the decimal constant 12345678 wound
    up being stored in this order:
    +—+—+—+—+—+—+—+—+
    | 7 | 8 | 5 | 6 | 3 | 4 | 1 | 2 |
    +—+—+—+—+—+—+—+—+

    Serial channels carry data 1 bit at a time. If the
    nodes practice the same endianness or the data are always
    the same length then this is not a problem.

    Confusion starts when incompatible machines send data of
    differing sizes over the same channels. Then the receiving
    node has to figure out how long a given datum is before it
    can unscramble it and store in its own native format.

  41. A.P. @ LI says:

    @Jim Fisher:
    “Now if one is definiing a structure that mirrors a register bank in your microprocessor, I do not know of any way to do that in a way that works for both endians.”

    But, in this instance, it shouldn’t be an issue: if the register bank is the same, you would be within the same family of micro and the endianness should be the same. I don’t know of any examples which violate that, although there may be some.

  42. A.P. @ LI says:

    @Gary Lynch:

    Ah, the VAX! It, and DEC equipment in general, are laws unto themselves: who else would redefine the byte as 9 bits and pack 5 ASCII characters, left justified, into a (36-bit) word?

  43. J.K. @ LI says:

    Allen Pothoof: “But, in that instance, it shouldn’t be an issue: if the register bank is the same, you would be within the same family of micro and the endianness should be the same.”

    Some processors can run either endian, but have I/O registers that don’t reverse their layout to match. Just one more of life’s joys.

  44. G.C. @ LI says:

    It doesn’t matter the target you use: ARM, MSP430, PowerPC, VAX, etc. — the solution is the same: ensure you include an interface layer to convert native formats to “external” formats (a hardware interface, a file, an ethernet/RS-232/ARINC/CAN/etc link). Then, 95% of your code doesn’t worry about endian-ness — it’s all “native” data handling. The interface layer is what contains the “marshalling/un-marshalling” routines Steve mentioned. This layer is the only place you have to write code that knows “endian-ness” exists. If you are writing in C/C++ on a common compiler, you can write endian-conversion code using the ntohs() (“network-to-host-order-short”), ntohl() (“network-to-host-order-long”), and ntohll() (“network-to-host-order-long-long”) and their opposites, htons(), htonl(), and htonll() to convert 16-, 32-, and 64-bit values, respectively, from your native endian-ness — WHATEVER it may be — to network-order (i.e., big-endian), one reason I recommend choosing big-endian for all external formats. For each multi-byte numeric field you read/write, simply call the appropriate function to “correct” the endian-ness of that field. The best part of these functions is that they are ALREADY optimized for each platform targeted by the compiler. No need to ask “what’s the best way to switch endian-ness on the XYZ processor?” — the compiler vendor has already done this analysis (assuming a decent compiler). No need to ask “do I need to convert endian-ness for this processor?”, because the compiler vendor has already arranged to make sure these calls generate NO code at all if the target process is already big-endian (calls are implemented as macros that equate to the actual functions or to “null” statements, depending on the endian-ness of the target processor). frankly, if your compiler does not have implementations of these functions/macros, I’d suggest writing your own versions, so that a subsequent port to a target with a decent compiler can automatically switch to using the compiler-supplied code without re-writes. This method makes it as painless as possible to deal with endian-ness in a way that supports software maintenance over the life-cycle of a system — your code could be re-targeted from an 8051 to a 68HC16 to a ColdFire to an MSP430 to a VAX and not require a change in a single line of code.

    Something Steve didn’t mention is that the benefit of “interface layer” functions is broader than just “endian-ness”. What if your target considers a “char” 16-bits? If the interface/file expects an 8-bit “char”, you need code to read/write this value while converting between 8-bit and 16-bit data objects — even if all data is the same endian-ness. or maybe your interface uses EBCDIC :-) Ditto with bit-fields in a protocol: regardless of the byte-wise endian-ness of the system, there can be a bit-wise endian-ness difference, and your interface layer should always make sure to pack/un-pack bit-fields accordingly (you may even have “bool” types in your internal data structures that are mapped to a single bit in the protocol definition.

    The moral of the story is to always provide a thin layer that provides the interface between your internal “native” data structures/types and the external data structures/types used in messages/interfaces to outside entities. The contemporary buzz-word for this is “marshalling/un-marshalling” — we used to just call it “packing/un-packing” or “packing/parsing” before every mundane operation required a fancy label.

    Even if no translations are required today, the targets may change, and the code you write today should automatically build the correct solution when that port occurs. Since this problem needs to be solved once (and then re-used) for any target your company (or client) supports, it is a low cost to build porting support into “v1.0″ rather than write the code specific to the original target and leave it to later programmers to figure out what needs fixing.

  45. A.B. @ LI says:

    I implemented once a MAC/IP/UDP stack on a PIC. When working with this kind of microcontrollers – that have very limited code space – you have to be careful NOT to use the macro version of htons htonl and alia – they take a HUGE amount of bytes. You will have to use a function, with all the overhead of this.
    Anyway, endianness is here to stay and there are not a lot of options. One of my customers used… ASCII on a communication protocol. Of course it incurs in a lot of communication overhead but it works for sure !
    Bitfield is a big “no” in communication protocols – if you check the implementation of tcp for example, all the bit fields do not straddle over bytes.

  46. A.B. @ LI says:

    Just another small comment about PCI… pci devices are required to support different endianness

  47. C.T. @ LI says:

    I work with code that is intended to be portable across various processors and compilers. This code largely includes device drivers and network protocol drivers. For non-CPU specific code, using shift/mask operations has worked quite well.

    #define CAST_8bit(value) ((unsigned char)(value))
    unsigned long binary_value = 0×12345678;
    unsigned char network_data[4];
    network_data[0] = CAST_8bit(binary_value);
    network_data[1] = CAST_8bit(binary_value >> 8);
    network_data[2] = CAST_8bit(binary_value >> 16);
    network_data[3] = CAST_8bit(binary_value >> 24);

    This puts the network data in little endian order regardless of the endianness of the binary value. Obviously, reconstruction is done in the opposite order.

    For some devices (like SPI EEPROM), the SPI commands are big endian, but you want to store the data in the same endian order as the CPU. For instance, many EEPROM devices auto-increment the address with each byte of data read/written within a single command. In this situation, I use pointer casting (you could also use a union) for the data portion of the read/write commands in order to send the least significant address first.

    My project is also intended to work on C2000 DSPs. I agree with the “nightmare” comment! For this CPU, you may need to add “& 0xFF” to the CAST_8bit macro to ensure that it masks correctly. Unfortunately, our “common code” assumes a byte is 8-bits, so the network_data[] array is twice the size it actually needs to be.

  48. Ronnie says:

    I remember I think it was 1980. There was a fun game going on. We’d come up with a new computer, and the software teams would come up with a new assembler, compilier, tools, and maybe come up with some apps. Then we’d come up with another computer, and they’d start over. I wondered what would happen if we kept the same platform for a while and let the software flower.

    A pair of guys from Intel showed up and told us about the 8086, and the soon-to-follow 8088, and the 8186. They said they’d all run the same code, and they had a roadmap that stretched far into the future – all software compatible. They’ve pretty much kept their word for 31 years, and reaped the benefits. I’ve spent much of my time making sure that the new iterations ran faster and supported more memory lest I be found in violation of Moore’s law. Took it up to maybe 80W/processor.

    What software flowers grew?

    Windows.

    Who needs to abstract a toaster?

    You see, as testified above, programmers don’t make code smaller unless it’s too big, and they don’t make it faster unless it’s too slow. No money in it.

    The result is bad code. Remember those old 40 MHz ’486s? They booted Windows of the day just as fast as my Windows 7 box does today. The only good code these days is in multi-media and mobile.

    I remember a professor in 1974 telling me “We gave them 64 bytes! It wasn’t enough! We gave them 128 bytes! It wasn’t enough! … We gave them 64KBytes! It wasn’t enough! It’s never enough!”

    Should have listened. I like tube amps. There’s no software in tube amps.

  49. G.C. @ LI says:

    Actually, in real life bit-fields are all over the place in stream/packet protocols, including bit-fields that straddle byte boundaries. This is relatively unremarkable because many protocols are used over links in which bandwidth requirements are intentionally kept to a minimum, whether it be a an RS-232 link or a satellite link. Data is often “packed” pretty much as tight as it can be. If a field has only 5-8 discrete values, it may well be packed to a 3-bit field. You can hardly blame the people who created these protocols for doing so, and it’s a bit late to second-guess them if the protocol is out there. A glance through the myriad of RFCs will find many examples of bit-field packed packet definitions. And the MPEG stream definition is basically one lone series of bit-fields with only haphazard alignment.

    As to implementing the byte-wise shifting method (v. say, byte swapping), this is fairly standard for hton*() and ntoh*() functions/macros — which, while they may require LOTS of space on some compilers certainly don’t need to. Many compilers give you simple methods to make these in-line or function calls, and the macros themselves are quite often merely used to either choose to compile code or not, a la:
    #ifdef BIG_ENDIAN
    #define htons(s)
    #else
    #define htons(s) _htons(s)
    #endif
    so that it either calls an actual function to reverse the endian-ness or simply replaces the macro reference with a null statement (i.e., “;”).

    In the end, the point is simply to
    a) pick a inter-processor endian-ness (big-endian is a good idea unless you have a system designed with no CPU bandwidth margin and all the processors are little-endian — a bad design decision, probably),
    b) implement your big-to-little and little-to-big functions once.
    c) use these every time you use this target
    d) if you need optimization, use the same function names, but tweak the code for each target to take advantage of any features of the particular processor (e.g., some CPUs can shift 8-, 16-, and 24-bits in one cycle to perform Chris’ shifts above, while many older CPUs needed a cycle per bit to shift, but might implement byte- or nibble-swap instructions).
    e) think about this problem very, very, very rarely after this — just call the macros without regard to whether you need them or not and let your #ifdef’s control whether they turn into code or not based on pre-processor symbols that are either already defined (for many standard OS environments) or can easily be defined by you in your makefile/project-file; e.g., #define BIG_ENDIAN // this target is big-endian).

    The best engineers are the lazy ones. Laziness is the mother of invention…and optimal re-use. And you don’t end up stressing about things like which endian-ness your CPU uses. I’ve been designing software on mixed-endian-ness multi-processor systems for 35 years, and I can’t tell you how long it’s been since I needed to spend more than 15 minutes considering endian issues. Bit-fields, of course, are a different matter, and marshalling/unmarshalling (packing/unpacking) routines are necessarily “custom” implementations for such protocols, though they typically port just as easily from one CPU to another in C.

  50. S.S. @ LI says:

    I was not advocating pre-processor trickery to make the endian-ness swap disappear on the “correct” architectue. I allways use the technique below (after Rob Pike),and would reccomend it to all. You can lose a tiny bit of cpu, but you gain in total portability, to any machine, even vaxen!

    -Steve

    ulong
    getlong(void)
    {
    ulong l;
    l = (getchar()&0xFF)<<24;
    l |= (getchar()&0xFF)<<16;
    l |= (getchar()&0xFF)<<8;
    l |= (getchar()&0xFF)<<0;
    return l;
    }

  51. S.S. @ LI says:

    When scheduling tests on different board and architecture, I normally run the tests on both big endian and little endian boards to make sure that the developers have taken care of endianness in their code :-)

  52. G.C. @ LI says:

    I’m often amused by those who look askance at “pre-processor trickery”, concluding perhaps that many C/C++ programmers are: a) unsure of the purpose of a pre-processor, and/or b) unaware of just how impossible it would be for them to compile any code that utilized standard C libraries without heavy pre-processor involvement.

    Having said that, there are many heinous tricks one could do with a pre-processor that aren’t particularly smart (e.g., hey, why not create your own statement “when” that is a #define for “if” so that you can ask “when (x == 0)…”? yikes!). But, in general, many things described as “trickery” are simply standard, almost ubiquitous uses of pre-processor features, without which most OSes would never compile and most C libraries wouldn’t compile. The point here is not to obfuscate or “trick”, but merely to have code automatically include itself when needed and exclude itself when not needed. This exact kind of construct appears in more code you use every day than perhaps you realize. In fact, every standard implementation of ntoh*() and hton*() use precisely this kind of “trick” — and it’s being used in virtually every piece of commercial software you use over the internet to your benefit.

    In the event, the endian-ness fix is the perfect place to make use of the pre-processor phase of your compiler toolset. If you are reading big-endian values into a big-endian machine, why not do it as efficiently as possible? If doing the same on a little-endian machine, why not do the extra work as transparently as possible? In short, why not do it in a way that you write it once and it works in every machine without changes and at the same time takes advantage of efficiencies when they are available? Of course, if your code is fetching bytes using “getchar()” — a relatively unlikely circumstance — this point is moot, since you can’t really squeeze any efficiency out of a one-char-at-a-time interface mechanism.

    On the other hand, I have no significant problem with implementing one’s “interface layer” using the kind of routine Steve posted above; I just don’t think of it as an “optimal” solution — but it IS a solution, and a pretty darned simple/straight-forward one at that. Which, I suppose, is the whole point of my original response to this inquiry regarding “how often” I think of endian-ness. Whether you use hton*()/ntoh*()-style solutions with pre-processor assistance to implement optimized solutions, or use “getlong()”-style solutions such as Steve proposed, the result is the same in this respect: you rarely have to think of endian-ness in the overwhelming percentage of the code you write — everything just works, even when you move it UNMODIFIED to another platform!

  53. R.S. @ LI says:

    Think about endianess, Only if I have to, because it usually hurts ;-) Communication issues with dissimilar hardware platforms are usually the reason I get here though.

  54. O.K. @ LI says:

    portable OS usually does supply necessary stuff for Little/Big End First conversion. for example Xilinx Kernel/Standalone provide following macros: Xil_In16, Xil_In32, Xil_Out16, Xil_Out32 – build respectively to the processor configuration (either L or B). See the xil_io.h header for the further details.

  55. K.H. @ LI says:

    One area I’ve encountered endian issues is in file formats. For example, some image formats store their data in big-endian format, while others store it in little-endian format.

  56. J.C. @ LI says:

    Hi Kevin:

    Most of the image formats that I have used: Bitmap, TIFF etc… usually have a bit in the header to tell you if it is big or little endian. Of course there is always an exception to any rule or specification

  57. RBJ @ LI says:

    i only today became aware of this thread. i have glanced through, but not read every response.

    i cut my teeth on the Mot MC6800 and Motorola was quite a bit into big-endian. it reads nice, but i found out later that the 6502 had it right.

    for numerical reasons (the act of “carrying”), little endian is better. you add (or subtract) the LSBs before you do so to the more significant bits. the 6502 also had the carry-bit logic for subtract down better than the 6800.

  58. J.C. @ LI says:

    Hi Robert;

    Nice to hear that somebody remembers the 6800 from Motorola. The 6502 was cheap down version of the 6800, it only had one accumulator and a reduced instruction set. It was used by some of the first retail computer manufacturers(Comodore,Apple..etc). They had a 16 Bit Address Space and a 8 Bit Data Bus so endianness was not such a problem.This came about with the 6809.

    The problem really came about when IBM (PC) decided to use the Intel 8080 for it’s computer system which came to be known as the “PC” because of it popularity and the place that IBM helded in the computer market. It didn’t really matter when the CPU was used with it’s own instruction set and memory, it becomes a problem when used with third party manufacturers’ who have to be concerned about the handling of data.

  59. RBJ @ LI says:

    i realize that the 6502 was “cheaper”. but some of the cheapness was more logical. when the 6800 or 6809 had a 16-bit offset added to an index register, it had to first load the MSB and do nothing with it, then it would load the lower byte and add that in right away, *then* it could add the higher byte with carry. it took one extra clock pulse compared to if it were little-endian.

    i also thought that that using the carry bit as “borrow-not” was logically correct. but the 6502 should have also had an ADD and SUB instruction in addition to ADC and SBC. but their subtract was simple, the “B” input was just logically complemented with XOR gates and the same “1″ that went into the XOR gates also went into the carry-in of the LSB full adder logic. simple and elegant, but, for subtract, the carry bit meant the opposite as it would for the 6800 or 6809.

  60. K.H. @ LI says:

    @John:

    Yes, most image formats do have a bit to indicate whether fields are big or little endian. That makes writing files simple. But for reading files, then every platform then has to worry about endianness since the endian bit may indicate non-native endian fields.

  61. J.D. @ LI says:

    @Robert:
    >> … for numerical reasons (the act of “carrying”), little endian is better. you add (or subtract) the LSBs before you do so to the more significant bits. the 6502 also had the carry-bit logic for subtract down better than the 6800.

    You are right. Endianness is a system concern, not only a processor concern. Even systems based on pure 8bit machines must choose an endianness to work up multiprecision math.
    I have written low-level libraries for many 8 bit systems, including the 6502 and 6809. I liked little endian representations because it was natural to “right-justify” the LSBs for different length numbers in memory, that got signed extended on the fly.

    - Jonny

  62. G.C. @ LI says:

    Good point on the arbitrary multiple-precision math. The earlier comment that a processor with an 8-bit data bus didn’t have much concern over endian-ness is of course missing the forest for the trees. If that 8-bit processor were reading multi-byte numeric data from files, serial links, etc. it would still have to potentially re-order the bytes into whether format the programmer of code for that processor used for multiple-precision math functions (in the “good old days”, we likely were writing such functions ourselves, as there weren’t any “standard libraries” floating around).

    However, even in these cases, there is still little reason not to simply convert “external” formats into “internal” ones in an “interface layer” before processing the data without regard to endian-ness within every other function in your system. This is true whether the internal format could strictly be described as “native” — that is, to match the ordering on a 16- or 32-bit processor — or simply represented the standard one assumed when writing their multiple-precision math functions on 8-bit only CPUs (i.e., those with no 16/32-bit registers or any native instructions that operate on multi-byte values in memory).

    An extreme example of such requirements is RSA encryption, which effectively relies on multiple-precision math on numbers that are really quite sincerely long — 128 bytes for a relatively weak 1024-bit key, complete with very long multiplies, divides, and modulo math. Some time back I had to write an encrypt/decrypt module in C, and a single version of this code handled keys of virtually any length on CPUs of any “integer” size using either endian-ness. A simple “rsa_target.h” file was used to configure it for RSA key width, CPU integer width, and endian-ness: “pre-procesor trickery” to some, but undeniably useful for portability to multiple devices all sharing encrypted data between them. So, even in this extreme case, I “thought about endian-ness” only very briefly relative to the time spent implementing the RSA math code.

  63. J.D. @ LI says:

    @Gary: you are right, of course. In our industrial applications (ARM), we have multiple fieldbusses connections to Profibus, DeviceNet, Ethernet and ModbusRTU, with basically the same data frames. Each protocol has a low-level endianness property that gets automatically handled by the protocol. The endianness can be flipped by the user on-the-fly, to adjust the data as seen by the PLC.
    When everything is set, nobody notices the endianness barriers working.

    - Jonny

  64. E. @ LI says:

    For high rate , little endian (e.g. X86) networking devices which forward IP packets in layer 3 , in which performance is an issue, I find it much better performance – wise to ntohs()/ntohl() just the local on stack copy of the few IP header fields that are needed for packet processing/inspection rather than convert the full header which will require just a little later to convert it back to network order and recalculate checksum from scratch.

    This is the way many efficient IP stacks for little endian platform works, by the way (including the Linux native IP stack).

  65. D.O. @ LI says:

    @Jonny Doin. I must correct you on one point. For 8bit processors and microcontrollers, this is not a compiler choice, but is a hardware aspect, at least in the 8 bits that I have used, there could be exceptions that I do not know of. Intel 8080 and therefore Z80 store little endian as they have 16bit operations. Intel 8051 family micro-controller stores big endian. Also as 8bit processors have 16bit address bus, these will have an endian type fixed by hardware.

  66. C.T. @ LI says:

    I don’t have to deal with endianess often in my work. However, I had to deal with endianess in one of my previous encryption based projects where the security keys had to be stored on novo memory outside the MCU. As most contributors to this post have mentioned, it can be a head spinning nightmare if you are not careful.

  67. J.D. @ LI says:

    @David Omar: thank you for pointing out that some 8bit processors have 16bit data.
    I have not said that this is not the case, I said that on some pure 8bit machines, the system can choose the endianness.
    Not all 8bit machines have large operands with hardware-set endianness. Take the 6502, for example, of which I was talking about. The 6502 has only 8bit memory accesses, an 8bit stack, only 8bit registers, and all data accesses to memory are just 8bit accesses. The systems based on the 6502 must choose the system endianness.

    There are other popular pure 8bit systems. The Michochip PIC16 and PIC12 are some examples. Although they have only a large register file instead of a memory system, the programmer tends to see the registers as memory, and there are no multibyte operations. In all my PIC systems firmware, my libs do multibyte store as little endian, but that is a design choice. It would be just as easy to implement a big endian system.

    >> “Also as 8bit processors have 16bit address bus, these will have an endian type fixed by hardware. ”
    NO. The endianness of the memory system is dictaded by multibyte DATA operations, and has no relation to the memory size. When you have a larger-than-8-bit data bus, the endianness is set for every memory access. But even when the data bus is 8bits, you may have multibyte load/store/process operations, and those will set a processor endianness.

    Another related aspect of hardware endianness is when you have DMA operations in wide data bus systems. For example, the ARM DMA controllers can be set up for the transfer width and endianness. You can have a DMA to perform 32bit transactions to drain a buffer in little endian, and fill a destination buffer with big endian storage, thus reversing the endianness of an array of words with zero CPU time cost.

    - Jonny

  68. D.R. @ LI says:

    @Jonny and David. Probably David would like to answer you, Jonny, so excuse me David. If you have to address a 16bit memory position, you are in fact using multibyte data, the pointers to memory.

  69. G.C. @ LI says:

    of course this is literally true, but if you are “sharing” pointers between platforms, you are living dangerously. endian-ness is not particularly remarkable within a single platform.

    if you have a 16-bit pointer in memory (to something else in memory; e.g., a “char *p = mybuffer;”), the manner in which your compiler reads and writes that pointer should be of no consequence. one hopes the compiler vender will be smart enough to read/load such a pointer in the same order that the same compiler writes/stores it! this should not be something you worry about as a programmer.

    now, if you are going to write your own hack to try to modify or test such a pointer byte-by-byte (e.g., test only the high-order byte instead of the whole pointer when testing for NULL, or increment the pointer manually by the byte with carry into the high-order byte instead of simply saying “p++”, well, then, you are asking to be abused by endian-ness.

  70. D.O. @ LI says:

    @Jonny. I could have phrased that last bit a lot better. What I was getting at is situations like this: execute a call instruction on on some/most/all 8 bit processors and a 16bit program counter gets pushed to the stack. Two bytes in a processor dependant endian order. I guess that compiler writers could ignore this an choose a different endian configuration for other data. There are other pointer type operations, even if the pointer is embedded in the instruction field.
    The last time I programmed a 6502 in C was at uni, so my memory of the experience is a little shaky. I guess your knowledge of that architecture is better than mine.

  71. G.C. @ LI says:

    as seemed reasonable, this concern would be much ado about nothing — or at least nothing ado about endian-ness :-) of course, the CPU pushes the 16-bit PC onto the stack in a processor-dependent order. fortunately, it pops it back off the stack later in the same order (unless the CPU designers had a sick sense of humor).

    one layer above this, compiler vendors must decide how to push an arbitrary 16-bit pointer (or data value) parameter on the stack (assuming you’re using a compiler that uses the stack for auto vars — e.g., many 8051 compilers eschew the limited 256-byte stack page and just create static memory maps of parameter space), but again, fortunately, these compiler vendors provide access to such parameters without having to know what order the 16-bit pointer/value was pushed to the stack (and, again, with the same potential that a sick sense of humor could theoretically lead to madness).

    now, if you find yourself having to write code that accesses return addresses or other pointers/values directly on the stack for some reason, you would have to know the endian-ness of these push operations, but there is little reason for anyone to do such a thing these days (yes, i admit i’ve done it — a youthful indiscretion, let us say). :-)

  72. J.D. @ LI says:

    @David: I was kind of expecting the PC stacking to be mentioned. In a sense, you’re right.
    The 6502 has a little-endian orientaton for memory pointers and the PC push/pop.
    However, as the

  73. M.P. @ LI says:

    After spending some of my early career attaching PC peripherals such as graphics controller to big-endian 68K based CPUs, it’s something that sticks in your mind. You also tend to remember the mistakes and the painful memories of not getting the byte ordering correct?

  74. JDR @ LI says:

    My comment about pointers to memory was related with the comment from Jonny: “However, in a 8bit machine with no fundamental multibyte memory access, it is a endianness-agnostic machine, even with a large adrress bus (like a 16bit address bus).”, to pinpoint that it’s not right. Independently of sharing pointers (not my intention!), or the use you make of them, the processor uses multibyte data, and thus there is endianness concept, hidden or not.
    And don’t forget that pointers, stack, return addresses, or in general any endianness issue can appear to the programmer any time you are debugging or writing low level code: OS code, debugger code, hook routines, performance routines, etc.

  75. JDR @ LI says:

    My comment about pointers to memory was related with the comment from Jonny: “However, in a 8bit machine with no fundamental multibyte memory access, it is a endianness-agnostic machine, even with a large adrress bus (like a 16bit address bus).”, to pinpoint that it’s not right. Independently of sharing pointers (not my intention!), or the use you make of them, the processor uses multibyte data, and thus there is endianness concept, little hidden or not.
    And don’t forget that pointers, stack, return addresses, or in general any endianness issue can appear to the programmer any time you are debugging or writing low level code: OS code, debugger code, hook routines, performance routines, etc. No matter if you are working only within a single machine for these kind of tasks. If you write assembly, or look at your data, or make memory dumps, you have to understand your architecture, and endianness is a part of it.

  76. J.D. @ LI says:

    @ Julio: Looks like my post got truncated at the middle of a phrase :-)

    I was saying that you and David are essentially right, and I stand corrected.
    That is the case with almost every 8-bit processor with a large memory, like the Z80, the 6502, 6809, and almost every microcontroller around.

    In my slim defense, it might be argued that some processors are off that hook. Such seems to be the case of Microchip PIC16Cs and PIC12Fs. Those processors have a true harvard organization, with implicit return stack, inaccessible from the program, and only implicit vectors to code memory, i.e., no “FAR” jumps. The program memory is fully banked, and the program has access to the PCLATH, the high-order bits of the program space for jumps. All data indirect addressing is done by a 8-bit index register, and the data space is also banked.
    The only hint of a hardware-set endianness is for some processors of the family where 16bit SFRs (special function registers) are laid out in data space with low-byte first, thus suggesting little-endianness. HOWEVER, those are fixed addresses, and not data operations. The accesses are always done 8-bits at a time, in totally unordered fashion. THEREFORE, it might be argued that such systems are really open for any endian direction the system designer might fancy.

    If that is rescue from my comment that “some pure 8-bit machines” can have it either way, I’ll be pleased. Of course my poor example of the 6502 is clearly wrong.

    Also, my comment over the address bus hinting of internal 16bit operations was WRONG, and I thank you both for pointing that out .

    And of course I agree with @Julio about the relevance of machine endianness for anything low-level.

    Ignoring endianness is a luxury for those that never get their hands under the hood.
    Which is practically impossible in any decent embedded design.

    Thank you for keeping me honest!

    - Jonny

  77. D.G. @ LI says:

    To all those who imply that little endian rules to desktop (and they don’t need to worry about it), all of the network protocols are using “network byte order” which today is still big-endian. All of the intel platform do a lot of bytes swapping to setup end points.

    Many use little endian at the application layer of custom protocols. This makes analyzing data with serial analyzers a bit difficult. The header of the packet is big endian and the payload is little endian.

  78. G.C. @ LI says:

    i don’t think anyone here suggested that we didn’t need to worry about endian-ness. the question was “how often do you think about endian-ness?” and i still contend the answer is “not often”. the rationale behind this answer is simple. it takes little effort to write a set of functions that can pre-process big- and little-endian 16-, 32-, and 64-bit values (and even IEEE floats) on matching- and opposite-endian platforms. once this is accomplished, one simply uses them over and over and over and over and… well, you get the idea. ergo, one needs to “think about endian-ness” only briefly when instrumenting their protocol layer to transmit/receive data to networks, serial links, files, custom hardware interfaces, etc.

    look at it this way: how often to you “think about” how to unlock and lock your car door? frankly, you have to perform this task more often than you have to produce protocol layers for software, yet you don’t really think about it much. once you’ve figured out how they work on your new car (e.g., push the left button on your key fob twice) you do it every day without thinking about it at all, because you’ve already solved that “puzzle”.

    anyone who spends any significant time wrestling over endian-ness issues is either: a) a relatively young engineer who is still learning how to architect efficient designs, b) the type of person who simply makes mountains out of molehills in general and makes no exception for complaints about the “perils” of endian-ness, or c) someone who frankly doesn’t have the aptitude to be an engineer. my inclination is to say that the posters who have here made this issue seem more complicated than it is are mostly (a)’s and (b)’s, but there are also a lot of (c)’s out there in the industry, as i’m sure you all have experienced.

    in my experience, it is “laziness” that makes the best engineers, because they have the drive to simplify efficiently and effectively, and to produce solutions that can be re-used effectively — and they are the type to spend little time worrying about endian-ness: they simply write code that works in architecturally sound ways, which reduces endian-ness to a triviality.

  79. Should IEEE have an Embedded System Society?

  80. R.M. @ LI says:

    I have also run into it when bootloading code on a processor. Some The endian-ness always needs to be right on this or the opcodes, which in this case are coming in via I2C in bytes. Dealing with word based instructions being reconstructed from bytes brings this to the forefront.

  81. RBJ @ LI says:

    Hi Gary,

    weeks ago you said:

    ” You can talk about the endianness
    a particular architecture assigns to:
    * Bits within a byte,

    * Bytes within a word, or

    * Words within the whole address space. ”

    i would point out simply, that endianness regarding “Bits within a byte” are not an issue unless you are dealing with serial transfer of data. normal programming of either a big-ass CPU or a microprocessor or microcontroller or a DSP does not deal with bit within a byte unless some other peripheral messed it up. when you fetch or store a word, it goes to or from the CPU register with the MSB going where the MSBs go and the LSB going where the LSBs go. doesn’t matter how you label that portion of memory.

    of course, when words are assembled from bytes, then you have to worry about endianness, particularly when data is transferred from one kind of computer to another. a good example is the difference between WAV files and AIFF.

  82. RBJ @ LI says:

    but, repeating myself from last week, just like the 0-based vs. 1-based array debate, i think it’s clear that both sides are not equivalent, from a logic and hardware POV. 0-based is more correct than 1-based and little-endian is more correct than big-endian. the only reason that 1-based or big-endian is “better” is because of adherence to a human convention, a convention that is not the best convention.

  83. L.M. @ LI says:

    This is a huge issue in massively complex systems such as military aircraft or integrated chasses like VME racks, where you have many types of buses running between many types of processors. Typically on aircraft you have multiple subcontractors for each subsystem (CNI, radar, etc.), and of course they’re competing so they don’t share designs, so the first time it all plugs in together is a SIL. Then you pass data from a 16-bit big end processor in the IFF across a 16-bit bus to a 32 bit little end mission computer and somehow your F15 looks like a Cessna in the lookup table. Your list, Gary, assumes that the engineers are all on the project from start to finish and that the whole job is being done in house — not the case with massive projects. In some org structures, the engineer doesn’t get the gear until the start of testing when the errors are already embedded rather than the start of design.

  84. D.H. @ LI says:

    Personally, I don’t see how little-endian is “more correct” that big-endian (or the other way around for that matter). There are advantages/disadvantages to both schemes (and what one person considers an advantage somebody else might consider a disadvantage). I just consider them to be different.

  85. G.C. @ LI says:

    Robert:
    I realize you were responding to a different “Gary”, but you bring up a good point.

    First, however, it should be noted that “bit-endian-ness” is in most cases merely a misunderstanding by some programmers. That is, there are processors that “name” their bits in reverse order “D0..D7…D15…D31″ versus the far more prevalent “D31…D15…D7…D0″, but this is mere nomenclature: in both cases, the least significant bit is in the same POSITION on both types of processors. In other words, if I try the equivalent of “AND reg, 0×01″ or “RRC reg, 1″ on either processor, I am targeting the same bit in both cases (I know that last one was an oldie — but a goodie; it rotates the bits in a register through the carry bit, putting the LSB into the C flag), even though one processor manual would CALL that bit “D31″ and the other “D0″.

    This is quite apart from what you might call the “endian-ness” of a serial interface (a bad term, I’d argue). Some of these are “least-significant-bit-first” and others are “most-significant-bit-first” — but this would be true regardless of which “bit-endian-ness” naming convention your processor uses. Therefore, I don’t personally think of this as a PROCESSOR endian-ness issue at all. In fact, this situation would be handled by definition entirely within the “interface layer” I mentioned earlier for the simple reason that it could not be handled anywhere else; that is, it has nothing to do with a generic processor-dependent characteristic, and therefore the code would not differ from one processor to the next, but rather only from one serial protocol to the next (which would presumably be encapsulated in the protocol layer code).

  86. RBJ @ LI says:

    it’s because, when you have a multi-byte value or pointer with offset, you add the less-significant bytes *first* before you add (with carry) the more-significant bytes. with big-endian, you end up loading up the more-significant bytes and holding them while you get your less-significant bytes to add. *then* you get to add the more-significant bytes. it costs you an extra clock pulse for a 16-bit operand with an 8-bit ALU to do it big-endian.

  87. G.C. @ LI says:

    Laura:
    Don’t know if you are referring to the other Gary’s “list” of endian-ness types, or my “list” of the types of engineers who fret unduly about endian-ness, but the latter seems more likely given your point. However, I think I have merely left you with the wrong impression. I’ve been designing and implementing embedded architectures for 35 years on systems as small as a single chip and as large as world-wide global networks involving literally thousands of disparate processor and bus architectures. I have no doubt that data can be broken apart and re-assembled hundreds of times from one end of a system to another. My only point is that — at any point along the way — you only have to “think about” this issue at your interface layers with the other components in your system — regardless of whether one engineer/company or a thousand are involved in the process. Somewhere, there is a definition of what the interface is, and once you’ve been doing this kind of thing for some time, you’ve pretty much solved all of the various and sundry translations that can exist between two heterogeneous endian-ness, bus-width, etc Therefore, a good engineer should have a simple menu of functions they (or others at their employer/client) that map data from “type A” to “type B”, and just pick the one that’s needed. Now, I’m certainly NOT saying that the documentation regarding which kind of interface you are talking to is going to be given to you in a painless, cost-free way — sometimes getting a simple ICD can be insane — but I don’t think of any of that nonsense as part of “thinking about endian-ness” because this is simply a document I have to get prior to implementation for a laundry list of reasons other than endian-ness — and, once I do get hold of it, the effort to actually deal with whatever it tells me about my endian-ness requirements is something I end up spending a very short time responding to. This is just as true with an F-16, a satellite network, or a DVR containing only a CPU and a media processor.

  88. G.C. @ LI says:

    Robert/Dave:

    Actually, BE only requires “more work” if you start out thinking like a LE programmer. :-)

    That is, while I agree with Robert from a theoretical perspective — and from the perspective of us “old guys” who learned how to do everything on 8-bit processors and in assembly code — the multi-byte addition on a BE processor would not likely be done the way he describes it, but by simply pre-calculating the LSB address based on the “int *” and on the “sizeof(int)”, then DEcrementing the pointer as you fetch each byte (and carry out) rather than fetching from the original pointer and INcrementing with each byte. So, you are looking at only the extra effort to add N to the original address before fetching begins. (And, of course, on 32-bit systems, its moot since the whole mess of bytes is pre-fetched and automatically ordered in the registers to do the math.)

    Some other marginal “efficiencies” in LE-8 (LE on 8-bit processors) v BE-8 include:
    a) you don’t care if the “char” you received from an external source is 8-bit or 16-bit, because the important byte is at the same address either way (i.e., the first byte is the only significant one), and
    b) “small” integers can be read in small or large chunks and get the same result (e.g., 0×34 0×12 is the same number as 0×34 0×12 0×00 x00).
    However, I admit that these are somewhat contrived “efficiencies” in modern systems. And, a similarly contrived case could be made that BE-8 is more efficient at comparisons; e.g., 0×12 0×34 0×56 0×78 > 0×13 0×34 0×56 0×78 is faster byte-by-byte than is 0×78 0×56 0×34 0×12 > 0×78 0×56 0×34 0×13 (2 compares v. 4).

    So, I also have to agree with Dave that it’s something of a wash, especially when the real method of doing BE-8 math does not require reading a bunch of bytes twice or buffering them somehow, but merely pre-advancing the initial pointer and then reading/processing in DEcrementing rather than INcrementing order.

  89. RBJ @ LI says:

    Gary, sure, with BE, you can point the index register at the (other) end of a multi-byte binary number and decrement the pointer as you’re adding in the bytes. but the program counter (PC) sometimes called the instruction pointer (IP) *increments*. there is this other addressing mode called “immediate addressing” where it is the PC that is fetching those bytes.

    again, adding (using immediate addressing) a 16-bit operand with a byte-wide machine (8-bit ALU) *must* take an extra clock pulse with BE than it would with LE.

  90. JDR @ LI says:

    @Gary. Just curiosity, because I don’t know whether I’m missing something in your numerical example. I don’t see why you would make TWO (instead of 1) comparisons vs 4, but above all, I would not make more than ONE ever in your example. When you compare numbers, you take the most significant ‘digit’ numbers, whatever the base, to start comparison, and follow with the next most significant ones, etc to stop as soon as the digits are different. As you have said before, whether BE or LE here only means to locate your index appropriately, although in this case it is LE who has to add an offset.

  91. G.C. @ LI says:

    Julio:
    Yes, you’ve caught the fact that I changed the numbers in my example in mid-stream, but neglected to go back and change the resulting comparison counts (there’s the reason why you should never make last-second edits!)

    Also, you have just pointed out the reason I called them “contrived” examples, a rationale I mistakenly thought would have been self-evident. That is, just as I noted that the LE-8 addition was “better” than the BE-8 method mainly because of the choice of methods as the most “straight-forward” (some would say “brute force”) one, the opposite advantage in comparing BE-8 vs. LE-8 was also based mainly on the choice of algorithm. In other words, the “contrivance” is the notion that you have to fetch the bytes in the order they are found in memory (rather than in reverse order). I seem to have thought wrong about the obviousness of what I was referring to by calling the different advantages “contrived”, but this is just another reminder of something else not to do on a forum like this (two avoidable errors in one post!).

    Now, having just described this “contrivance” I must concede to Robert that he has a valid point with regard to “immediate” operand addressing modes wherein the PC is used to address the bytes — by nature in auto-incrementing fashion, and in this one constrained case there is more significant efficiency to the LE-8 addition. Of course, this likely reflects something of a different kind of contrived example, since it is quite unlikely that an 8-bit architecture is going to let you create a 32-bit “immediate” operand, for example, so this should be a problem only if you have again written your algorithm in such a way to almost intentionally work against the native capabilities of your CPU. Certainly, if you are programming in C it is unlikely that you would end up in this situation. If programming in assembly, on the other hand, you would presumably code your math in such a way as to take advantage of your processor’s architecture rather than getting in its way. By following this rule, you should be able to make either LE-8 and BE-8 architectures appear “the best” of the two.

  92. MAG @ LI says:

    Recently I was asked to optimise a piece of “bit twiddling” protocol in C.

    I managed a x25 increase is speed that was declined on the grounds that it wasn’t “endian safe” ( even though I provided any endian possibilities ) and even took into account the differences in bit-packed structures.

    But it did point out to me that there’s just as much problem with bit packing rules as there is with endianness.

    8^)

  93. M.S. @ LI says:

    If you write binary data to a file that might be read by another system that had different endianess you have to worry.
    And using bitfields to do bit operations on things like hardware registers is definitely a bad idea! It is extremely non-portable.

  94. MAG @ LI says:

    Agreed, that’s what we have ntoh and hton for,
    … so why not have bitfields nailed down in the language spec ?

    What does the ANSI committee do for a living? ( sorry – maybe a little off topic. )

  95. B.D. @ LI says:

    Power PC embedded target and Intel PC monitor program using C#. Mostly 16 bit data values but doubles and floats always require a bit of though.

  96. C.F. @ LI says:

    If you think few people program in assembler anymore, I guess it depends on what business you are in. I program in C and assembler, but FAR more in assmebler, because I do very low-level embedded stuff where every cycle is critical and space is tight.

    Intel and its predecessors used little endian because their CPUs only did pointer increments, and doing basic math always works from least significant toward most, so it made more sense.

    Trivia: Do any of you know where the terms “little endian” and “big endian” come from?

  97. C.F. @ LI says:

    I lifted this from an on-line source:

    The terms big-endian and little-endian were introduced by Danny Cohen in 1980. He borrowed them from Jonathan Swift, who in Gulliver’s Travels (1726) used them to describe the opposing positions of two factions in the nation of Lilliput. The Big-Endians, who broke their boiled eggs at the big end, rebelled against the king, who demanded that his subjects break their eggs at the little end.

  98. J.D. @ LI says:

    @Christopher Fox:
    Amazing. Now I will think much more about Endianness!
    :-]

    - Jonny

  99. RBJ @ LI says:

    thanks Christopher. i was saying something similar a month ago, but someone confirming that helps reduce confusion. while i coded 6502 at some time in my life, i had never coded Intel (except in C on a PC) and knew that they were also little-endian as was manifested in the difference between .WAV and .AIF sound files. i remember having to write code to parse these files and having to fetch byte-by-byte and assemble samples, even though it was not necessary if the format and the machine got matched.

    also, i had some problems with COFF file format for code files stored on a PC. are COFF files big-endian? i just can’t remember.

  100. RBJ @ LI says:

    “Do any of you know where the terms “little endian” and “big endian” come from?”

    i thought it was which end the MSB is. or is it whatever is the byte that comes first? little-endian for LSB first and big-endian for the byte with the MSB first?

    also, i meant to say, for comparison, i did 6502 and a bunch of big-endian Mot processors. endianness and the sense of the carry bit for subtraction seemed curiously odd for the 6502, but later understood why that was simpler (and, if it’s a matter of convention, simpler is better, IMO).

  101. G.L. @ LI says:

    Mike Smith: Isn’t modifying hardware registers on a specific CPU inherently non-portable, no matter how you do it?

  102. M.B. @ LI says:

    Almost every day. I develop embedded software (PowerPC/VxWorks/gcc) and PC-based software (Visual C++, C#) for systems that talk to each other, and have designed UDP- and TCP-based messaging infrastructures for these systems. Both byte and bit ordering are issues: The PowerPC is big endian, the X86 is little endian, and the gcc and Microsoft compilers assign bit fields in opposite order. On top of that, it is sometimes necessary to handle variable endianness based on a flag in the message header.

    If you can define all the data to be consistent, e.g., all 16- or 32-bits wide, you can swap bytes on output/input. If not, it obviously gets more complicated. One solution I used recently on the PC side was to subclass the .Net BinaryReader and BinaryWriter classes and handle endianness inside the classes. This avoided numerous ‘if’ statements to swap bytes.

    As suggested by others, with regard to compiler-generated bitfields, it’s probably best to avoid them where communications is involved.

  103. M.B. @ LI says:

    Frequently. I have developed UDP- and TCP-based messaging infrastructures for embedded big endian (with gcc/C++) and PC-based (VC++, VC#) little endian systems that talk to each other, so both byte and bit (gcc and Microsoft specify bits in opposite orders) ordering have been issues.

    If you can define your message data to be a consistent size, such as all fields16- or 32-bits wide, you can swap buffers full of bytes on output/input without regard for the actual data. If not, it obviously gets more complicated. One solution I used on the PC side was to subclass the .Net BinaryReader and BinaryWriter classes to handle endianness inside the derived classes. This avoided having individual statements to swap bytes for each data item.

    As suggested by others, it’s probably best to avoid compiler-generated bitfields where communications is involved. Given that bitfields represent convention, not hardware architecture, it’s too bad this was wasn’t addressed in language specifications.

  104. T.M. @ LI says:

    Multi-port RAM for inter-processor communications using shared memory for communicating isosynchronous processes! Ah, those were the days!!!
    High-low of low word followed by low-high of high word…stack and heap pointers that push to consecutive LOWER addresses and pop to HIGHER…and even some compilers that have heap growing UP away from stack and stack pushing DOWN away from heap/stack boundary in physical memory…such funs, and oh so many ways to go wrong…living in the murky fuzzy gray fog between hard-, firm-, and soft-ware, writing Interface Descripton Specifications, and then having to chase the requirements down blind alleys as they get changed, and changed, changed again on the fly…life sitting at the debug bench in the Engineering Lab, trying to do “system integration” on systems that are hopelessly unstable…late nights…wee small hours of the morning…do I miss it at all, at all?

    You bet I do.

  105. G.P. @ LI says:

    Most of my problems with endianess have involved communications protocols. As previous comments have explained, it comes down to specifying your interfaces correctly and completely. In order to achieve this you have to be aware of the issue of endianess.

    The majority of issues that I have come across have involved the Modbus protocol and floating point numbers.

  106. J.D. @ LI says:

    As @Greg Philpott mentioned, Modbus drivers are sources of endian headaches.

    The Modbus network protocol is big-endian, so it would be only a matter of using hton() and ntoh(), but what happens is that several devices implement data packing in application-specific ways.

    You will find PLCs that pack data in little endian format into the Modbus line, and others that pack words in big-endian, and dwords in little-endian, i.e., dwords are packed with the lower words before the upper words, but the words are packed in big-endian.

    We ended up leaving to the user the selection of the endianness of the target PLC in our products. We fill the data map with a known data template with word-sized and dword-sized integers, and dword-sized IEEE754 floats. The user can select the endianness and check in the PLC the “right” data mapping.

    - Jonny

  107. R.V. @ LI says:

    For safety critical applications? Definitely.

  108. R.V. @ LI says:

    Now, who still does one’s complement processors?

  109. A.M. @ LI says:

    I’m not sure abut processors, but TCP/IP protocl stack uses one’s complement arithmetic for TCP and UDP checksums, so everything networked using TCP/IP uses it.

  110. E.K. @ LI says:

    As of PLC I would love to see you guys move out of words and dwords to int16_t and int32_t, or uint16_t and uint32_t if you prefer. I don’t like those dos-words etc. AFAIK, Modbus is a byte protocol (7-bit ASCII version also exist), so it’s up to application to decide which byte order to use to represent more wider data.

  111. RBJ @ LI says:

    maybe this is getting OT (one’s complement is not about endianess), but long ago (like decades) i had learned something sorta new at a previous employer. i was already quite familiar with assembly language programming on the Mot 68K for the purposes of generating and processing audio signals. sample rate wasn’t very high for full bandwidth audio, but it was plenty fast for varying parameters such as envelopes and modulation.

    anyway, a problem for reducing the worst-case execution time was if the math called for negating a value, negating 0×8000 would have the problem of not really being negated. you would have to test for that value and map both 0×8000 and 0×8001 to 0x7FFF. if you didn’t do that, you could end up hearing the same kind of nasty full-amplitude click you hear from the non-saturating overflow (that wraps around), the mostest horriblest kind of overflow distortion.

    so the solution was to accept the small error of not adding 1 to the one’s complement. in this kind of negation, -1 or 0xFFFF would get mapped to 0 (not +1) and there would be a little bit of zero-crossing distortion. and 0×8000 would get mapped to 0x7FFF, so negation could be done with a single instruction (NOT, i think it was called) and no testing was needed for that one nasty case of 0×8000.

  112. V.G. @ LI says:

    Being a Java programmer, I am shielded from such issues as endianness. In Java, everything is big-endian everywhere. That said, whenever I have to exchange information with some other system in a manual way, streaming the bytes myself, not through some framework, endianness definitely comes in the picture.

  113. J.F. @ LI says:

    I may be repeating something already said as I must confess that I have not read all 107 comments… But
    One the discussion of one’s complement, which really has nothing to do with endianess, one’s complement involves simply complementing each bit. Two’s complement involves one’s complement and adding 1 to the result. This is usually referred to as negating, and yes, in two’s complement, one has to be aware that 0×8000 (the largest negative number) will not negate properly but will remain as 0×8000.

  114. R.H. @ LI says:

    Well, Modbus being big-endian doesn’t say much about the data in a message.

    And I haven’t even seen a discussion about bit numbering. Most of us say that bit 0 is the LSB (i.e, “at the right”). Surprise surprise, in the PLC IEC-61131 languages bit 0 is the MSB. It isn’t as illogical as it sounds, as the IEC follows array-ordering. But, I agree, it is entirely different from embedded practice.

    So, if you connect an embedded device via Modbus to a PLC, be prepared to convert some more. And, @Vagelis, if the embedde device uses Java you are not shielded from endianness. No programming language shields you from such issues if you are communicating via networks.

  115. G.C. @ LI says:

    alas, Vagelis, saying that you are “sheilded from such issues as endianness” as a Java programmer sounds a bit like “Reason #2346 Why I think Java is Grand” :-) it’s no more meaningful than saying that “as a C programmer, i am shielded from such issues as whether arrays are 0-based or 1-based” — that is, because if everything i write is similarly in C it will always be the former. that says nothing at all in reality, of course, since no one of us has control over what kind of systems we are going to interact with at the interface where our own personal code ends. when you Java code reads a file that was produced by someone else’s application (perhaps even a legacy app over which you had no control) and which produced the file using little-endian data formats, your Java had better darn well get used to converting LE data to BE for processing. this is quite literally not a bit different than the experience of all the C programmers out there laboring without the benefit of Java. one might as well say, “because i program exclusively on BE processors, i am shielded from endianness”. it is not C (or any other language) that causes endian-ness issues, but precisely the fact that the world is not composed of homogeneous hardware platform architectures that causes the problem. in that environment, Java is C is Ada is APL is Pascal is Fortran is COBOL is…

  116. R.V. @ LI says:

    Makes me worried about the future. After a huge catastrophic event many of the people who produce things we upstream “abstract” users take for granted will be dead. We don’t worry about endianess because somebody downstream does that, the tool developers, the I/O library guys, compiler gurus et. al. the chip makers and the people who make the tools to make the chips. They all worry about their own version of endianess in the process or materials. A society that is so specialized had better INSURE that the basic food chain of things needed to sustain civilization is not monopolized and is geographically diverse. Yet we know about huge concentrations of functionality from the actual human food chain to the knowledge and equipment to make parts and assemblies that finally reach the end products. NOT being worried about endianess is a symptom of a mindset that we don’t NEED to worry about things outside our little sphere. I suggest that we all should be concerned about endianess, meaning that we need to organize in such a way that if we want to not have to worry about endianess we have put measures in place to ensure that’s really true even when the s(^t hits the fan. Mr. Murphy will make sure it eventually does… If you are not worried about endianess you probably don’t know enough about your application or the environment it may encounter, intended or no.

  117. R.S. @ LI says:

    I used to come across it all the time while working for Intel as an FAE in the early 80′s, Motorola was are largest competitor and discrete TTL!

  118. S.W. @ LI says:

    Not only dealing with communicating protocols but also dealing with peripheral devices, should you become quite aware of endianess. For example, PowerPC is big endian, the datasheet describe Universe in little endian format. If you have not changed the bit order of the Universe registers in a .h file, you should always use the load/store with byte reverse instructions of the PowerPC.

    I know that x86 and PowerPC both provide assembly instructions to change the byte order. Therefore, embedded assembly is an efficient way to change the byte order.

  119. J.A. @ LI says:

    If you don’t pay attention to it, you will know right away when you get weird data on the other side!!! Always specify your endianess on the wire then it is easy to go back and forth without much effort.
    Do not use bitfields there are not standarized: By the C standard, the compiler is free to store the bit field pretty much in any random way it wants. You can never make any assumptions of where the bits are allocated.

  120. R.H. @ LI says:

    Even on a single platform endianness can trouble you when using chips that expect a different endianness than the CPU uses.

    Some argue that you should just communicatie with plain ASCII, although this has no endianness issues it is CPU-intensive (parse) and needs more bandwidth; additionally the conversion back-and-forth may cause loss of accuracy in floating point data. And, to do float’s correctly (seldomly done), be prepared to handle special values such as +Inf, -Inf, Nan, -0, denormalized numbers etc. correctly (according to IEEE-754).

    Although float’s don’t have endianness in the traditional meaning, the bytes must still be in the proper order for the CPU/FPU to recognize them. So swapping may be needed.

    Bitfields are unreliable; always hard-code which bits you want. Also pack all data in a network message yourself; don’t copy ‘structs’ as the compiler may insert invisible fillers for alignment of certain datatypes. Also, I once had a compiler which added fillers at the end of a struct to make its size always a multiple of 4. Then, ‘sizeof’ gives different numbers on different compilers. This is especially troublesome when writing portable code.

  121. VAH @ LI says:

    The proper layout of structs is another complex subject that high level programmers can afford to ignore. Firmware engineers are more likely to pay attention to alignment issues.

  122. J.D. @ LI says:

    Alignment and endian order is really crucial if you are accessing any structs’ object representation (i.e., the struct in memory). That is a dangerous way to do things, though. Unless you are moving data blindly from one memory place to another, you should access the fields by their selectors, instead of reading/writing the underlying bytes.
    This is often done when interfacing assembly to C/C++, but is a red flag, because code can break unexpectedly due to a simple recompilation.
    The best way to do it is invoking the C compiler from assembly code to form the correct address, as allowed by some toolchains, like ARM rvct.

    - Jonny

  123. C.F. @ LI says:

    I have found this problem when I have developed a module written in C that implement a communication with UDP messages.
    I have executed this steps:
    1) I have developed high level functions that work with struct.
    2) I have implemented a cast , from high level structure at byte array (for send with UDP socket)
    3) I have used htons(host to network short) and htonl ( host to network long) for messages to send
    4) I have used ntohs (network to host short) and ntohl (network to host long) for incoming messages.

  124. D.R. @ LI says:

    I don’t know about writing in assembler, but ic C, there are functions to deal with this issue: “host-to-network long” htonl(), network-to-host long, or ntohl(), htons(), and ntohs(), for the short int. etc.

  125. A.A. @ LI says:

    Yes, I have to think of endianess far more often than I would like to. Network communication between different architectures, accessing hardware with a different endianness than the host CPU, etc.

    This endianness thing is a real nightmare even for me, having dealt with it for over 15 years. If you work with different hardware, you never feel 100% confident you’re doing things right. Add to that that endianness only affects bytes, not bits within, so you (in your mind) have to continously switch from left-to-right to right-to-left and back. Plus, there is no agreement between CPU and other chip manufacturers on how to count bits within the byte/word/whatever. So in some cases “bit 7″ is 128, and sometimes it is 1. That’s totally weird.

  126. R.D. @ LI says:

    As has been mentioned, this is primarily an issue when working with multiple processor systems or when communicating between systems. Within a single processor platform, this comes up only rarely when you need to deconstruct bits or fields within some packet of data. For the majority of development, this is not a daily concern, and therein lies the trap.

    Clearly this is important, based on several high-profile industry failures in interconnecting complex systems that require communication, but since it is not highly significant on a daily basis for an individual platform, there is a tendency among developers to think that the whole world does, or should, communicate exactly the way their platform does. Often the problem is on both ends, with each side of a multi-product team thinking their platform is the defining platform and no team is authorized or assigned to resolve or test the integration. Teams I’ve worked on often list this issue on review checklists and test coverage requirements.

    I understand why the situation exists, but really, the why doesn’t matter much. As engineers, we often have to deal with the world as it exists, not as we would like it to be.

    A lot of the more recent communications and interface standards define a specific endianness or provide a way to specify the method being used, yet too many manufacturers or development teams ignore the definitions, even when they are developing generalized products or need to interface to external systems.

    I think about endianness a lot, but mostly in regards to interface definitions. In multi-layer architectures, where do you implement a required conversion from one endian format to another? If you choose wrong, you either encourage errors or burden a platform with a lot of conversion overhead. It isn’t an easy or simple problem, and it’s not ideologically sophisticated, so it’s probably one of the least sexy things to work on as a developer. That may be why it’s a difficult thing to manage and a significant contributor to system failures. If there’s a simple solution to this that doesn’t involve a rigorous attention to detail, I’d love to hear about it.

  127. K.M. @ LI says:

    Hi,

    I didn’t wrote endianess specific code until recently where my system/software requirements demanded so.
    I have a Linux based embedded system which can be controlled from Host and Remote SDK. The SDK is basically a glorified n/w communication layer between Linux system from another Linux / Windows/ VxWorks etc system running on x86 architecture or any other architecture(32/64 bit). So for multi-system/ architecture and OS environments demands endianess has to be taken care-of.

    Thanks,
    Kiran.

  128. J.D. @ LI says:

    @Kiran:
    In that case, where you have TCP based transfers, you should assemble packets using hton() and ntoh() functions. If the granularity is the same in all communicating systems, these network-endian functions will guarantee endianness abstraction.

    - Jonny

  129. Mike says:

    The most recent time I had to think about endianness was when I reverse engineered a file format, without source code or even a detailed spec. We had a vague idea of the types of data stored in the file, and the order in which some of the fields were stored, but no specifics. And we only had two examples of the file, with no ability to get or generate more before we had to produce our analysis. Some fields in the file were big endian, some were little endian, and some were BCD. Some (but not all) multi-byte fields were stored in every other byte of the file, with intervening bytes containing 0. Other fields stored 12-bit data in the middle 12 bits of a 16 bit field, with two 1-bits on top and two 0-bits on the bottom. That was one odd file format!

    Other than that, the main time I think about endianness is during debugging, when I call a generic dump function that dumps bytes in hex and ASCII. If it is a structure that I am dumping, I have to be pay attention to both endianness and alignment issues.

  130. K.S. @ LI says:

    Here is a page from the QNX operating system documentation that discusses how to work with endianness issues.

  131. D.A. @ LI says:

    You typically only deal with this when you are crossing system boundaries, e.g. bus, network, serial, etc.

  132. R.H. @ LI says:

    Bi-endian RISC and other x-86 CISC I thought these days made it transparent to coders and applications but there is a cost for that until the first two operands are hashed and endianess is determined and stored [set] in the register

  133. R.D. @ LI says:

    @Richard, I know a number of the newer processors, such as ARM and PowerPC provide boot configurations that can let you set the endianness of the CPU if your project requirements mandate this, but it is not really transparent to the coder and it doesn’t switch on the fly based on the data set. There is a small efficiency cost for every command, not just the first couple of operands. You can use conditional compilation and casts to create adaptable code that adjusts to this setting. That’s basically the path used in the document Keith referenced. The endianness is also almost always defined in the platform headers for the compiler to support network packet functions like ntoh() and hton().

    It’s been a while since I worked on an x-86 platform, but last I saw, they used either a fixed little endian or defaulted to little endian with a boot configurable option for big endian. If they have an on-the-fly adjustment now, I’d like to see the details, but last I was aware, this was also just a boot option. I don’t think an auto-detect scheme would work well since, even if you could detect CPU opcodes, you would still never know the format of external data and there would be no way to auto detect that. In any case, I would expect this to produce a small efficiency cost on the CPU as well.

    The problem with the boot configuration option is that it tends to just make things more confusing. It’s almost always less efficient for the CPU core and it just adds extra points of potential confusion and failure to the platform development when developers may expect the platform to operate in the native configuration. Most production or architecture teams will never use the option since it creates more complexity than it solves.

    Most standardized software suites have headers and driver interfaces that bury this at the lower levels, so it might be semi-transparent to application layer coders. Applications would rarely need to deal with this unless they were looking directly at a raw data packet that had not been processed for the host format or if a file with raw numeric data is transferred from one host file system to another without adjusting (like between MSWin and NIX systems.)

    Driver developers need to deal with this at peripheral device input or other external data sources. Most external data devices and protocols do not include the endianness data in the data stream or packet, they define it hard-coded in the device data interface spec based on the device family or manufacturer. Some communications protocols, like ethernet, define a fixed format for all standard data on the network side. Others, like CAN or SPI, define endianness in the definitions for messages or individual values.

    Once the data is conditioned for the host, you rarely need to deal with this for internal system processing. As Dennis mentioned above, it’s generally an issue at system boundaries.

  134. QNP @ LI says:

    I have been faced with the big and little endian problem in C. In deed, when we optimize code to cast a struct of bit-field with 32 bits length into a scalar integer value to make a comparison with a fixed value. It could lead to the error!

  135. B.G. @ LI says:

    @Steve Simon: I am constaltly amazed by the number of people who overlay a struct pointer (in c) over a protocol buffer and then byteswap the members of the structure.

    I do this when needed, and see no issue with it.

    @steve: The C compiler is not required to pack structures, and though many compilers can be forced to do this it is silly to rely on it.

    Compilers have some sort of “#pragma packed” for this very issue. I jusrt dealt with it reading in a structure on an external EPROM on a replaceable item (the EPROM provides info on the item’s use).

    BUT – packing and byte sex are unrelated.

    @steve: The argument that if the local byte order is “correct” then it will be more efficent is also foolish, again I/O is the limiting factor.

    Agreed – but the real issue is you still need to read a struct from something external and turn it into the “native” format – sometimes a packing issue, sometimes a byte sex issue; sometimes both.

    The only reason to create an unpacked version of the struct and laboriously transfer the fields from the original to the copy is the native version will not support the packed fields, ie a byte takes 16 bits, or you cannot access a byte on an odd address.

    @steve: Oh dear, have I started ranting again…

    It’s OK, you are in good company, and the drugs should kick in right about … now :^)

    @OP: If you stay completely in the native CPU/memory/peripherals, you rarely need to worry about byte sex. HOWEVER, the embedded world does not have this luxury. Embedded systems of any complexity *always* deals with protocols and devices on the “outside of the CPU”, so byte sex is one of the issues.

    As noted before, the C preprocessor can easily deal with this:

    #if SWAP_BYTE_SEX
    #define SWAP16(x) swap16( x )
    #else
    #define SWAP16( x ) x
    #endif

    ptr->field = SWAP16( ptr->field );

    is perfectly OK. I prefer to pass in the address of the field, and have the code swap in place, but that is a style issue:

    swap16( &(ptr->field) );

    put all of the swap calls in a single function that can be ignored if needed.

  136. J.P. @ LI says:

    Htonl, htons, ntohl, ntohs are your best friends, and both bitfields and struct padding are typically compiler-dependent.

  137. L.H. @ LI says:

    IBM 360 and derivatives are big endian

    as far as structure fields with bit fields, they are reversed

Leave a Reply to R.C. @ LI