![]() ![]() In other words, Unicode represents a character in an abstract way and leaves the visual rendering (size, shape, font, or style) to other software, such as a web browser or word processor. ![]() In text processing, Unicode takes the role of providing a unique code point-a number, not a glyph-for each character. In the case of Chinese characters, this sometimes leads to controversies over distinguishing the underlying character from its variant glyphs (see Han unification). Unicode, in intent, encodes the underlying characters- graphemes and grapheme-like units-rather than the variant glyphs (renderings) for such characters. Many traditional character encodings share a common problem in that they allow bilingual computer processing (usually using Latin characters and the local script), but not multilingual computer processing (computer processing of arbitrary scripts mixed with each other). Unicode has the explicit aim of transcending the limitations of traditional character encodings, such as those defined by the ISO 8859 standard, which find wide usage in various countries of the world but remain largely incompatible with each other. UTF-16 extends UCS-2, using one 16-bit unit for the characters that were representable in UCS-2 and two 16-bit units (4 × 8 bits) to handle each of the additional characters. UCS-2 uses a 16-bit code unit (two 8-bit bytes) for each character but cannot encode every character in the current Unicode standard. UTF-8 uses one byte for any ASCII character, all of which have the same code values in both UTF-8 and ASCII encoding, and up to four bytes for other characters. The most commonly used encodings are UTF-8, UTF-16 and the now-obsolete UCS-2. Unicode can be implemented by different character encodings. The standard has been implemented in many recent technologies, including modern operating systems, XML, Java (and other programming languages), and the. Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard is maintained by the Unicode Consortium. As of June 2016, the most recent version is Unicode 9.0. The standard consists of a set of code charts for visual reference, an encoding method and set of standard character encodings, a set of reference data files, and a number of related items, such as character properties, rules for normalization, decomposition, collation, rendering, and bidirectional display order (for the correct display of text containing both right-to-left scripts, such as Arabic and Hebrew, and left-to-right scripts). Developed in conjunction with the Universal Coded Character Set (UCS) standard and published as The Unicode Standard, the latest version of Unicode contains a repertoire of more than 128,000 characters covering 135 modern and historic scripts, as well as multiple symbol sets. Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |