Navigation
  • Home
  • Recent
  • Most Active
  • Popular
  • Blog
  • Credits
  • RSS
  •   Interaction
  • Register
  • Statistics
  •   Help
  • Suggestions
  • Contact Us
  • How to Edit
  • Help



  • [Edit]



    Unicode reserves 1,114,112 (= 220 + 216 or 17 × 216, hexadecimal 110000) code points.

    As of Unicode 5.0.0, 101,063 (9.1%) of these codepoints are assigned, with another 137,468 (12.3%) reserved for private use, leaving 875,441 (78.6%) unassigned. The number of assigned code points is made up as follows:
    98,884 graphemes

    140 formatting characters

    65 control characters

    2,048 surrogate characters


    The first 256 codes correspond with those of ISO 8859-1, the most popular 8-bit character encoding in the Western world. As a result, the first 128 characters are also identical to ASCII.

    The Unicode code space for characters is divided into 17 planes, each with 65,536 (= 216) code points, although currently only a few planes are used:
      Plane 0 (0000–FFFF): Basic Multilingual Plane (BMP)
      Plane 1 (10000–1FFFF): Supplementary Multilingual Plane (SMP)
      Plane 2 (20000–2FFFF): Supplementary Ideographic Plane (SIP)
      Planes 3 to 13 (30000–DFFFF) are unassigned
      Plane 14 (E0000–EFFFF): Supplementary Special-purpose Plane (SSP)
      Plane 15 (F0000–FFFFF) reserved for the Private Use Area (PUA)
      Plane 16 (100000–10FFFF), reserved for the Private Use Area (PUA)

    The cap of 220 code points (excluding Plane 16) exists in order to maintain compatibility with the UTF-16 encoding, which addresses only that range (see below). Currently, about ten percent of the Unicode code space is used. Furthermore, ranges of characters have been tentatively blocked out for every known unencoded script (see *), and while Unicode may need another plane for ideographic characters, there are ten planes available if previously unknown scripts with tens of thousands of characters are discovered. This 20 bit limit is unlikely to be reached in the near future.


        Mapping of Unicode characters
            Basic Multilingual Plane
            Supplementary Multilingual Plane
            Private Use Area
            Other planes
            Mapping tables

    top

    Basic Multilingual Plane

    The first plane (plane 0), the Basic Multilingual Plane (BMP), is where most characters have been assigned so far. The BMP contains characters for almost all modern languages, and a large number of special characters. Most of the allocated code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK) characters.



    The graphic on the right is a visual roadmap to the Basic Multilingual Plane. The colours in use are:
      000000"> Black  = Latin scripts and symbols
      00C0FF"> Light Blue  = Linguistic scripts
      0000FF"> Blue  = Other European scripts
      FF8000"> Orange  = Middle Eastern and SW Asian scripts
      FFCC00"> Light Orange  = African scripts
      00C000"> Green  = South Asian scripts
      8000FF"> Purple  = Southeast Asian scripts
      FF0000"> Red  = East Asian scripts
      FF8080"> Light Red  = Unified CJK Han
      FF00FF"> Magenta  = Symbols
      C0C0C0"> Light Grey  = UTF-16 surrogates and private use
      00FFFF"> Cyan  = Miscellaneous characters
       White  = Unused

    As of Unicode 5.0, The BMP includes the following scripts:
      IPA Extensions (0250–02AF)
      Greek and Coptic (0370–03FF)
      Cyrillic Supplement (0500–052F)
      Arabic Supplement (0750–077F)
      Indic scripts:
      Lao (0E80–0EFF)
      Ethiopic Supplement (1380–139F)
      Philippine scripts:
      Khmer Symbols (19E0–19FF)
      Phonetic Extensions Supplement (1D80–1DBF)
      Greek Extended (1F00–1FFF)
        Superscripts and Subscripts (2070–209F)
        Combining Diacritical Marks for Symbols (20D0–20FF)
        Control Pictures (2400–243F)
        Enclosed Alphanumerics (2460–24FF)
        Block Elements (2580–259F)
        Miscellaneous Mathematical Symbols-A (27C0–27EF)
        Supplemental Arrows-A (27F0–27FF)
        Supplemental Arrows-B (2900–297F)
        Miscellaneous Mathematical Symbols-B (2980–29FF)
        Supplemental Mathematical Operators (2A00–2AFF)
        Miscellaneous Symbols and Arrows (2B00–2BFF)
      Georgian Supplement (2D00–2D2F)
      Ethiopic Extended (2D80–2DDF)
      Supplemental Punctuation (2E00–2E7F)
      Ideographic Description Characters (2FF0–2FFF)
      CJK Symbols and Punctuation (3000–303F)
      Katakana Phonetic Extensions (31F0–31FF)
      Enclosed CJK Letters and Months (3200–32FF)
      CJK Compatibility (3300–33FF)
      CJK Unified Ideographs Extension A (3400–4DBF)
      Yijing Hexagram Symbols (4DC0–4DFF)
      Yi Syllables (A000–A48F)
      Yi Radicals (A490–A4CF)
      Modifier Tone Letters (A700–A71F)
      Hangul Syllables (AC00–D7AF)
      High Surrogates (D800–DB7F)
      High Private Use Surrogates (DB80–DBFF)
      Low Surrogates (DC00–DFFF)
      Private Use Area (E000–F8FF)
      Arabic Presentation Forms-A (FB50–FDFF)
      Variation Selectors (FE00–FE0F)
      Vertical Forms (FE10–FE1F)
      Combining Half Marks (FE20–FE2F)
      CJK Compatibility Forms (FE30–FE4F)
      Small Form Variants (FE50–FE6F)
      Arabic Presentation Forms-B (FE70–FEFF)
      Halfwidth and Fullwidth Forms (FF00–FFEF)
      Specials (FFF0–FFFF)

    Several scripts are expected to be included in the BMP in the next revision of Unicode. These scripts, and their proposed code point ranges, are the following:
      Latin Extended-C (2C60–2C7F)
      Santali (Ol Cemet' / Ol Chiki) (2DE0–2DFF)
      Vai (A500–A61F)
      Latin Extended-D (A720–A7FF)

    Several other scripts are proposed for inclusion in the BMP, including:

    top

    Supplementary Multilingual Plane

    Plane 1, the Supplementary Multilingual Plane (SMP), is mostly used for historic scripts such as Linear B, but is also used for musical and mathematical symbols.

    As of Unicode 5.0, Plane One includes the following scripts:
      Linear B Ideograms (10080–100FF)
      Aegean Numbers (10100–1013F)
      Ancient Greek Numbers (10140–1018F)
      Cypriot Syllabary (10800–1083F)
      Sumero-Akkadian Cuneiform (12000–1236E and 12400–12473)
      Byzantine Musical Symbols (1D000–1D0FF)
      Musical Symbols (1D100–1D1FF)
      Ancient Greek Musical Notation (1D200–1D24F)
      Tai Xuan Jing Symbols (1D300–1D35F)


    Many other scripts are proposed for inclusion in Plane One, including:
      Old Permic
      Manichaean
      South Arabian

    top

    Private Use Area

    A Private Use Area (PUA) is one of several ranges which are reserved for private use. For this range, the Unicode standard does not specify any characters.

    The Basic Multilingual Plane includes a PUA in the range from U+E000 to U+F8FF (57344–63743). Plane Fifteen (U+F0000 to U+FFFFF), and Plane Sixteen (U+100000 to 10FFFF) are completely reserved for private use as well.

    The use of the PUA was a concept inherited from certain Asian encoding systems. These systems had private use areas to encode Japanese Gaiji (rare personal name characters) in application-specific ways. Similarly the ConScript Unicode Registry aims to coordinate the mapping of scripts not yet encoded in or rejected by Unicode in the PUAs. The Medieval Unicode Font Initiative uses the PUA to encode various ligatures, precomposed characters, and symbols found in medieval texts.

    One example of usage of the Private Use Area is Apple Computer's usage of U+F8FF for the Apple logo.

    top

    Other planes
    Plane 2, the Supplementary Ideographic Plane (SIP), is used for about 40,000 rare Chinese characters that are mostly historic, although there are some modern ones. Plane 14 (E in hexadecimal), the Supplementary Special-purpose Plane (SSP), currently contains some non-recommended language tag characters and some variation selection characters.

    top

    Mapping tables






     
    Search more:
     

       
    Source Privacy License Download Contact Us Atlas
    Scientus.org Dictionary (Yet Another Wiki) RC : 1.39
    MIT OpenCourseWare
    This article is licensed under the GNU Free Documentation License [copyleft]. It uses material from the Wikipedia article "Mapping of Unicode characters". link