Navigation
  • Home
  • Recent
  • Most Active
  • Popular
  • Blog
  • Credits
  • RSS
  •   Interaction
  • Register
  • Statistics
  •   Help
  • Suggestions
  • Contact Us
  • How to Edit
  • Help



  • [Edit]


    In computing, endianness is the ordering used to represent some kind of data as a sequence of smaller units. Typical cases are the order in which integers are stored as bytes in computer memory (relatively to the memory addressing scheme) and the transmission order over a network or other medium. When specifically talking about bytes in computing, endianness is also referred to simply as byte order, or (less often) as byte sex. The term endianness alludes to the war described in Gulliver's Travels between the two factions of the Big-Endians, who preferred cracking open their soft-boiled eggs from the big end, and the Little-Endians who preferred the little end.

        Endianness
            Endianness in computers
                Logical and arithmetical description
                    Bit Endianness
                Portability issues
                Example programming caveat
            Endianness in communications
            Endianness of date formats
            Endianness in addresses
            Discussion, background, etymology
            See also

    top

    Endianness in computers
    There seems to be no significant advantage in using one method of endianness over the other, and both have remained common in terms of the number of different architectures that use them. However, because little-endian Intel x86 based processors (and their clones) are used in most personal computers and laptops, the vast majority of desktop computers in the world today are little-endian. This is sometimes called "Intel format". Networks generally use big-endian numbers as addresses; this is historically because this allowed the routing to be decided as a telephone number was dialed. Motorola processors have generally used big-endian numbers, and ARM processors gained truly switchable endianness in order to improve performance of networking devices based on them.

    Generally the byte (octet) is considered an atomic unit from the point of view of storage at all but the lowest levels of network protocols and storage formats. Therefore sequences based around single bytes (e.g. text in ASCII, UTF-8, or one of the ISO 8859 encodings) are not generally affected by endian issues. On the other hand, variable-width text encodings using the byte as their base unit, such as UTF-8, could be considered to have an inbuilt endianness that, at least in all commonly used text encodings, is fixed by the encoding’s design. However, Unicode strings encoded with UTF-16 or UTF-32 are affected by endianness, because each code unit must be further represented as two or four bytes.

    top

    Logical and arithmetical description


    Note: all numerical values in this section presented in code style are in hexadecimal notation.

    Big-endian

    When (some) computers store a 32-bit integer value in memory, based on 1-byte atomic element size and 1-byte address increment (for example 4A3B2C1D at address 100), they store the bytes within the address range 100 through 103 in the following order given in first example:



    That is, the most significant byte (also known as the MSB, which is 4A in our example) is stored at the memory location with the lowest address, the next byte in significance, 3B, is stored at the next memory location and so on.

    Furthermore, we present a step forward involving atomic element size. We still consider 32-bit integer values stored in memory, but based on 2-byte atomic element size and 1-byte address increment. For the same example (4A3B2C1D at address 100), the bytes are stored within the address range 100 through 103 in the order given below in second example:



    The most significant atomic element becomes now 4A3B in our example, followed by 2C1D at address 102.

    The third example targets the major difference between atomic element size and address increment: in this example, 32-bit integer values are stored in memory, still based on 2-byte atomic element size, with 2-byte address increment:



    The most significant atomic element still is 4A3B in our example, followed by 2C1D located at address 101.

    Architectures that follow this rule are called big-endian (mnemonic: "big end in" - the big end goes in first) and include Motorola 68000, SPARC, PowerPC (which includes Apple's Macintosh line prior to the Intel switch), and System/370.

    Other computers store the value 4A3B2C1D in the following order:

    Little-endian


    That is, least significant ("littlest") byte (also known as LSB) first, represented by 1D in our example.



    The least significant atomic element becomes now 2C1D in our example, followed by 4A3B at address 102.



    The least significant atomic element still is 2C1D in our example, followed by 4A3B at address 101.

    Architectures that follow this rule are called little-endian (mnemonic: "little end in" - the little end goes in first) and include the MOS Technology 6502, DEC VAX, and most notably the Intel x86 based series of processors including Intel Pentium based personal computers and laptops.

    In other words, endianness does not denote what the value ends with when stored in memory, but rather which end it begins with.

    Note that the stated mnemonics are not the origin of the terms, see below.

    Bi-Endian

    Some architectures can be configured either way; these include ARM, PowerPC (but not the PPC970/G5), DEC Alpha, MIPS, PA-RISC and IA64. The word bi-endian, said of hardware, denotes willingness to compute or pass data in either big-endian or little-endian format (depending, presumably, on a mode bit somewhere). Many of these architectures can be switched via software to default to a specific endian format (usually done when the computer starts up); however, on some architectures the default endianness is selected by some hardware on the motherboard and cannot be changed by software (e.g., the DEC Alpha, which runs only in big-endian mode on the Cray T3E).

    Note that there are some nominally bi-endian machines which are not bi-endian to both the program and to peripheral devices. Most notably, some PowerPC processors in little-endian mode do not act as true-little-endian systems. Although they act little-endian from the point of view of the executing programs, they do not store data in memory in little-endian format (multi-byte values are swapped during memory load/store operations). This can cause problems when memory is transferred to an external device if some part of the software, e.g., a device driver, does not account for the situation.

    Middle-endian

    Still other architectures, called middle-endian (or sometimes mixed-endian), may have a more complicated ordering such that bytes within atomic units composing words are swapped.

    To illustrate this feature, consider a 32-bit word 4A3B2C1D stored in memory. This memory is featured as middle-endian, 2-byte atomic element size and 1-byte address increment:


    or alternatively:



    We can notice through this example that a same middle-endian data can have two distinct representations, one based on a big-endian layout, the other one based on a little-endian layout.

    To avoid this ambiguity, the term byte-swap jointly used with either little-endian or big-endian is preferred.

    Hence, data 4A3B2C1D can be stored into memory according to following middle-endian representations:
    1D2C3B4A, big-endian/byte-swap with 2-byte atomic element size and 1-byte address increment (1st diagram).

    3B4A1D2C, little-endian/byte-swap with 2-byte atomic element size and 1-byte address increment (2nd diagram).

    1D2C3B4A, big-endian/byte-swap with 1-byte atomic element size and 1-byte address increment.

    4A3B2C1D, little-endian/byte-swap with 1-byte atomic element size and 1-byte address increment.

    1D2C3B4A, big-endian/byte-swap with 4-byte atomic element size and 1-byte address increment.

    1D2C3B4A, little-endian/byte-swap with 4-byte atomic element size and 1-byte address increment.

    1D2C3B4A, big-endian/byte-swap with 2-byte atomic element size and 2-byte address increment.

    3B4A1D2C, little-endian/byte-swap with 2-byte atomic element size and 2-byte address increment.

    1D2C3B4A, big-endian/byte-swap with 1-byte atomic element size and 2-byte address increment.

    4A3B2C1D, little-endian/byte-swap with 1-byte atomic element size and 2-byte address increment.



    The format for double-precision floating-point numbers on the VAX and ARM is middle-endian. 32-bit words were typically stored in a middle-endian format on the PDP-11, and the term pdp-endian is still sometimes used to refer specifically to this format. In general, these complex orderings are more confusing to work with than consistent big or little endianness.

    There is no guarantee that a platform will use any of these formats, but in practice, there are few if any exceptions.

    top

    Bit Endianness
    The concept of endianness is less important in the numbering of bits within a byte, as computer architectures in general do not support the addressing of individual bits within bytes. Sub-byte addressing is instead accomplished with arithmetic and logical instructions that are well-defined in terms of the significance of the bits, rather than an arbitrary numbering in an address space, and therefore architecture-neutral.

    Issues similar to those of byte-endianness can still apply when interpreting bit position as something other than binary significance or when dealing with numbers that do not fill an exact multiple of bytes in a data format. In this case, a decision must be made as to whether the least significant bit is considered to be first or last. In particular, C allows fields in records to be defined with bit level granularity. In this case the assumed bit endianness (which is a compiler level abstraction, not a property of the processor) may be the same as the architecture's byte endianness.

    If a file is read by reading a record into memory, or written by writing the record as a single large block of bytes, consideration must be given to the fact that the fields in the record may not be in the correct byte order. Calls must be inserted to a routine that converts between host byte order and the byte order used in the file. Similar considerations must be made if handling network packets in this way. Bitfields that lie across byte boundaries are likely to make code for reading a format very awkward to port (if they don’t lie across byte boundaries, the order of the bitfields that make up a byte can simply be swapped).

    top

    Portability issues
    Endianness has implications in software portability. For example, in interpreting data stored in binary format and using an appropriate bitmask, the endianness is important because different endianness will lead to different results from the mask.

    Writing binary data from software to a common format leads to a concern of the proper endianness. For example saving data in the BMP bitmap format requires little-endian integers - if the data are stored using big-endian integers then the data will be corrupted since they do not match the format.

    Software that needs to share information between hosts of different endianness typically uses one of two strategies. Either it can choose a single endianness for sharing data, or it can allow hosts to share data in any endianness that they choose, so long as they mark which one they are using. Both approaches have advantages: on the one hand, choosing a single endianness makes decoding easier, since software only needs to decode one format. On the other hand, allowing multiple endiannesses makes encoding easier, since software doesn’t need to convert data out of its native order; and also enables more efficient communication when the encoder and decoder share a single endianness, since neither needs to change the byte order. Most Internet standards take the first approach, and specify big-endian byte order. Many vendor originated formats simply use the byte order of the platform they originated on. Some other applications, notably X11, take the second approach.

    UTF-16 can be written in big-endian or little-endian order. It permits a Byte Order Mark (BOM) of 2 bytes at the beginning of a string to denote its endianness. A similar 4 byte byte-order mark can be used with the rare encoding UTF-32.

    top

    Example programming caveat
    Below is an example application, written in C, which demonstrates the dangers of programming endianness unaware:

      include
      include

    int main(void)


    This code compiles properly on an i386 machine running FreeBSD and a SPARC64 machine running Solaris, but the output is different when examined with the hexdump utility.

    i386 $ hexdump -C output
    00000000 66 6f 6f 00 67 45 23 01 62 61 72 00 |foo.gE
      .bar.|
    0000000c

    sparc64 $ hexdump -C output
    00000000 66 6f 6f 00 01 23 45 67 62 61 72 00 |foo..
      Egbar.|
    0000000c

    top

    Endianness in communications
    In general, the NUXI problem (also called the endian problem) is the problem of transferring data between computers with differing byte order. For example, the string "UNIX", packed with two bytes per 16-bit integer, might look like "NUXI" to a machine with a different byte order. The problem is caused by the difference in endianness. The problem was first discovered when porting an early version of Unix from PDP-11 (a middle-endian architecture) to an IBM Series 1 minicomputer (a big-endian architecture); upon startup, the computer output replaced the string "UNIX" with "NUXI".

    The Internet Protocol defines a standard "big-endian" network byte order. This byte order is used for all numeric values in the packet headers and by many higher level protocols and file formats that are designed for use over IP.

    The Berkeley sockets API defines a set of functions to convert 16- and 32-bit integers to and from network byte order: the htonl and htons functions convert 32-bit ("long") and 16-bit ("short") values respectively from host to network order; whereas the ntohl and ntohs functions convert from network to host order.

    Serial devices also have bit-endianness: the bits in a byte can be sent little-endian (least significant bit first) or big-endian (most significant bit first).
    This decision is made in the very bottom of the data link layer of the OSI model.

    top

    Endianness of date formats
    Endianness is simply illustrated by the different manners in which countries format calendar dates.

    In the United States, dates are most commonly formatted as Month; Day; Year (e.g.: "May 24th, 2006″, "5/24/2006″). This is a middle-endian order.

    Most of Oceania, South America and Europe (except Sweden and Hungary where ISO 8601 is most common), format dates as Day; Month; Year (e.g.: "24th May, 2006″, "24/5/2006″, "24/5-2006″, "24.5.06″). This is little-endian.

    In many other countries, including China and Japan, use of the ISO 8601 international standard ordering of dates is prevalent: Year; Month; Day (e.g., "2006 May 24th", or, more properly, "2006-05-24″). This is big-endian.

    The ISO 8601 ordering scheme lends itself to straightforward computerised sorting of dates in lexicographical order, or dictionary sort order. This means that sorting algorithms do not need to treat the numeric parts of the date string any differently from a string of non-numeric characters, and the dates will be sorted into chronological order. Note that for this to work, years must always be expressed as four digits, months as two, and days as two. Thus single-digit days and months must be padded with a zero yielding ‘01′, ‘02′, … , ‘09′.

    top

    Endianness in addresses
    Western postal addresses are largely written in little-endian order, starting with the smallest component (the name of the recipient), progressing through house number, street, town, region, and country. In some Asian countries (e.g., Japan), postal addresses are instead written in big-endian order, that is starting with country, region, town, and ending in the recipient's name.

    Internet domain name system and email addresses are little-endian, following the Western tradition of postal addressing. Filesystem pathnames in Unix (and most other contemporary operating systems), are big-endian, listing the highest-level directory first. URLs, which combine both these notations, are mixed-endian, with a little-endian hostname followed by a big-endian pathname, as in
    protocol://organization.region.country/department/subdepartment/person


    top

    Discussion, background, etymology
    Big-endian numbers are easier to read when debugging a program. Some think they are less intuitive because the most significant byte is at the smaller address. Some think they are less confusing because the significance order is the same as the order of normal textual character strings in the computer, just as in non-computer text (see below).

    Little-endian numbers enjoy some slight computational advantages in that variables in memory do not have to be read and manipulated at their full widths. For example, a 32 bit variable in memory such as 00 00 00 4A can be read at the same address as either 8 bit (4A), 16 bit (00 4A), or 32 bit (00 00 00 4A) as long as its value stays within bounds. Big-endian cannot do this because the relative location of the least significant byte(s) change with the overall width of the variable. For example, 00 00 00 4A would become 00 when addressed as an 8 bit variable. Big-endian numbers are always corrupted if addressed as the wrong width unless the address is adjusted.

    A person’s preference usually is based both on which convention was studied first, and on which convention the person’s mental models were built. Specifically in the case of people with little background in low-level computing, most spoken languages express most numbers, especially those larger than a hundred, in big-endian manner. In English, for example, one says "three hundred twenty-four", not "four-and-twenty and three hundred". * *. One notable counter-example is the German and the Dutch languages which uses Little Endian for numbers between 21 and 99 and mixed Endianness for larger numbers (e.g. vierundzwanzig/vierentwintig (24, literally four-and-twenty), and hundertvierundzwanzig (124, literally hundred four-and-twenty).

    The Hindu-Arabic numeral system is used worldwide and is such that the most significant digits are always written to the left of the less significant ones. Writing left to right, this system is therefore big-endian. Writing right to left, this numeral system is little-endian. It is worth noting, however, that in quite a few languages the spoken order of numerals is inconsistent with how they appear written and in some languages, such as Hebrew, it is common to interrupt the writing of text (right-to-left) to write a number in the opposite order (left-to-right). German or Dutch speakers, however, do not write small numbers from right to left.

    The choice of big-endian vs. little-endian was as arbitrary as the entire concept is, and has been the subject of flame wars. Emphasizing the futility of this argument, the very terms big-endian and little-endian were taken from the Big-Endians and Little-Endians of Jonathan Swift’s satiric novel Gulliver’s Travels, where in Lilliput and Blefuscu Gulliver finds two factions warring over which end of a boiled egg should be cracked open.

    See the ''Endian FAQ'', including the significant essay "''On Holy Wars and a Plea for Peace''" by Danny Cohen (1981).

    Little-endian ordering has been used in compiling reverse dictionaries, such as rhyming dictionaries, where the entries begin, for example, with "a, aa, baa,..." and end, for example, with "...buzz, abuzz, fuzz." An actual example is the pronouncing dictionary for Cantonese (ISBN 962-948-509-5) which begins with "a, ba, da, dza,..." and ends with "...tyt, tsyt, m̩, ŋ̩".

    top

    See also
     
    Search more:
     

       
    Source Privacy License Download Contact Us Atlas
    Scientus.org Dictionary (Yet Another Wiki) RC : 1.39
    MIT OpenCourseWare
    This article is licensed under the GNU Free Documentation License [copyleft]. It uses material from the Wikipedia article "Endianness". link