endianness - Everything2.com

Most computer architectures let a program address individual bytes in memory, and most OSs let programs read and write individual bytes to files. Most computers can also do arithmetic on 16-bit integers, which require two 8-bit bytes to store them. That raises the following question: when you want to store a number like 0x1234, which byte goes first?

On one hand, if we store the most significant byte at the lower memory address, then memory dumps will look like

... stuff stuff 12 34 stuff stuff ...

which is nice, since that is the way we are used to reading numbers. If we stored the numbers with the least significant byte first, they would appear "swapped around". On the other hand, storing the least significant byte first lets us take a pointer to a 16-bit number and use it as if it pointed to an 8-bit number. On modern computers this is not very interesting because they access the memory in word units anyway, but on old 8-bit microcomputers it might be a worthwhile optimization, and also a "natural" way of extending the computer to 16-bits while remaining backwards compatible.

With so excellent reasons for either choice, the computer industry predictably split about 50-50. Computers that store the least significant byte (the "little end") first are said to be little-endian (also known as Intel format, since this is what x86 does). Computers that store the most significant byte first are called big-endian (also known as Motorola format, or network byte order since TCP/IP uses this order). The concept itself is known as Endianness or byte order. There are even some agnostic architectures, like the PowerPC, which let you switch mode at runtime. (Try booting an iMac into Open Firmware, change the byte order to little-endian, and then watch MacOS crash when it tries to boot).

The terms little- and big-endian are taken from Gulliver's Travels, where the Lilliputians wage bitter war over the proper way to open a boiled egg. (See the big-endian node for the relevant quote). The choice of byte order is about as consequential, and it too leads to unnecessary suffering for programmers that need to deal with several platforms.

Firstly, every communication protocol and file format needs to fix a byte order or provide a mechanism for indicating it, and if the receiving program happens to run on a processor of the wrong endianness it must swap the numbers as it receives them. TCP/IP specifices big-endian order, and programmers can use the hton macros to convert from "host" to "network" order without doing any redundant work -- a good example of how to write portable code despite endianness issues.

Secondly, it is surprisingly easy to write code that depends on a particular byteorder in some subtle way, or with bugs which will only reveal themselves under a certain byteorder. This is a common snag when porting C programs between different architectures. For example, short foo; scanf("%d", &foo); incorrectly passes a pointer to a short to scanf claiming it is a pointer to int (the code should have used "%hd"). But on a little-endian computer this might not be noticed immediatly, because scanf will write some extra zero bytes to the memory location immediatly after foo.

Of course these days computers can work on 32-bit numbers as well. Then not only do we get the obvious orders ABCD and DCBA, but occasionally the "middle-endian" CDAB and BADC crop up in obscure protocols and file formats... Oh well, perhaps we should be happy that the remaining 20 permutations are not in use!

big-endian	little-endian	0xCAFEBABE	URL escape sequences
self-extracting executable	Computer Architecture	Architecture Description Language	byte order mark
middle-endian	Sun Ultra 60	offsetof	byte order
Lifehouse	Open Firmware	Ten Little Indians	Gulliver's Travels
easter egg