Fonts and software resources for the Unicode Character Set

Last updates: Wed May 19 09:15:28 2004 Sat Nov 27 11:05:38 2004 Thu Dec 2 15:20:20 2004 Fri Dec 3 13:44:15 2004 Fri Feb 3 16:38:46 2006 Fri Apr 14 07:46:27 2006 Thu Mar 23 14:23:46 2017

What is Unicode?
Who uses Unicode?
Fonts and software resources for the Unicode character set
Other Unicode resources and Web sites

What is Unicode?

The Unicode character set is a character set intended to represent the writing schemes of all of the world's major languages. Although early versions could be represented with 16 bits (65,536 characters), by 1996 at version 2.0, that proved insufficient, and it is now believed that at least 21, and possibly 22, bits will ultimately be required, supporting a few million characters.

At Unicode version 2.0, there were 38,885 assigned characters. At version 3.0, there were 49,194 assigned characters. At version 3.2, there were 95,156 assigned characters. At version 4.0, there are 96,382 assigned characters.

Variable-width encoding schemes have been developed to minimize the number of bytes required to store Unicode characters. Files containing only 7-bit ASCII characters are unchanged when viewed in the Unicode UTF-8 encoding, so plain ASCII files are already valid Unicode files. With UTF-8, up to four 8-bit bytes may be required to access all defined Unicode characters.

Some early Unicode implementors of programming language compilers, and the designers of the Java programming language, chose 16-bit representations: with the Unicode UTF-16 encoding, the first 63,486 characters are represented in 16 bits, while the remaining 2,048 combine with a following 16-bit value to represent another 1,048,544 characters as a pair of 16-bit values. Since 2048 + 63486 = 65534, which is two less than the 65,536 values representable in 16 bits, there are two remaining 16-bit values: U+FFFE and U+FFFF. They are not used to encode characters, but instead are reserved for internal use (U+FFFF as a sentinel, and U+FFFE as a byte-order indicator). Other compiler implementors store Unicode characters in 32-bit integers (the UTF-32 encoding), allowing a simple correspondence of one Unicode glyph to one integer.

The large number of characters in this set naturally poses a severe problem for a font vendor, and also for storage resources on systems that use Unicode. Thus, although the Unicode work has been underway since 1990, font support for Unicode has taken, and will continue to take, years of work, and the available font repertoire is still rather limited. By comparison, tens of thousands of fonts are available for 8-bit character sets: for a sampling, visit this list of font names by vendor. More information about Unicode fonts is given below.

The Unicode Standard is defined in this printed book:

@String{pub-AW                  = "Ad{\-d}i{\-s}on-Wes{\-l}ey"}

@String{pub-AW:adr              = "Reading, MA, USA"}

@Book{Unicode:2003:USV,
  author =       "{The Unicode Consortium}",
  title =        "The Unicode Standard, Version 4.0",
  publisher =    pub-AW,
  address =      pub-AW:adr,
  pages =        "xxxviii + 1462",
  year =         "2003",
  ISBN =         "0-321-18578-1",
  LCCN =         "QA268 .U545 2004",
  bibdate =      "Tue Oct 21 17:47:30 2003",
  note =         "Includes CD-ROM.",
  URL =          "http://www.unicode.org/versions/Unicode4.0.0/",
  acknowledgement = ack-nhfb,
}

Earlier editions of the Unicode Standard were 1.0 (1991/1992), 1.1 (1993/1995), 2.0 (1996), 2.1 (1998), and 3.0 (2000) [consistent with ISO/IEC 10646-1:2000]. The current version is Unicode version 4.0.

The relation between the Unicode and ISO/IEC 10646 Standards is discussed in Unicode and ISO 10646: although the character codes are synchronized, there are still important differences.

The Unicode Consortium maintains a World-Wide Web site at http://www.unicode.org/

An extensive bibliography of publications about Unicode is available at http://www.math.utah.edu/pub/tex/bib/index-table-u.html#unicode

Who uses Unicode?

Unicode is used in at least these operating systems:

Apple NewtonOS
AT&T/Lucent Technologies Bell Labs Inferno
AT&T/Lucent Technologies Bell Labs Plan 9
BeOS
Java OS
Metaphor OS
Microsoft Windows NT
Microsoft Windows CE
NeXT OS

and at least these programming languages:

AT&T/Lucent Technologies Bell Labs Limbo
Java

The 1989 ANSI/ISO Standard C multibyte and wide character data types can also offer limited support for Unicode. However, most conventional programming languages are not equipped to deal with Unicode characters because they have deeply ingrained assumptions about the storage size of characters.

The Omega typesetting system developed by Yannis Haralambous and John Plaice is an extension of the widely used TeX typesetting system to support and use Unicode.

Most Unix operating system vendors have begun development work to support Unicode in future releases.

Fonts for the Unicode character set

Limited Unicode font support is available from:

Bigelow and Holmes: Lucida Sans Unicode
Bittext Unicode math fonts resembling Times, but defined entirely as Metafont programs.
Free UCS (Universal Character Set) Outline Fonts and Free UCS Outline Fonts archive
Victor Gaultney's Gentium [The font family can be used freely by anyone, but there are some basic restrictions: essentially, no modification, and redistribution only of the complete set.]
Michael Everson's EversonMono, a monospaced font [7000+ glyphs current implemented] that will ultimately support all non-Han characters in the Basic Multilingual Plane of ISO/IEC 10646-1 (BMP == Unicode).
Yannis Haralambous: OmegaTimes and OmegaHelvetica
James Kass' Code2001 font (in development in early 2003, but available for download)
Mark Leisher: 3100-character Unicode font collection in X Window System BDF format, and the XmBDFEd Motif-based BDF Font Editor
Microsoft
Monotype News of mid-November 1999 told of a complete Unicode-3.0 font named Andale; no further information is yet available.
Edward H. Trager maintains list of Unicode font resources.
Free UCS outline fonts: list of free font resources for various ISO-8859-n code pages and portions of the Unicode set.
Andrew West maintains list of general Unicode resources and Unicode font resources.
Alan Wood maintains a Unicode fonts site with pointers to several font sources, and a list of Unicode resources.
SC UniPad: The Unicode text editor: comes with a built-in collection of 52,000+ bitmap characters for fast screen display [Windows platforms only]

Other Unicode resources and Web sites

More information on Unicode fonts can be found at http://www.truetype.demon.co.uk/unicode.htm

The new OpenType font specification, jointly developed by Adobe and Microsoft, provides for future support of Unicode. OpenType is based on a merger of Adobe Type 1 and Apple/Microsoft TrueType font formats, and OpenType systems will support those older fonts as well. Plans for OpenType support have been announced by several major font and operating system vendors.

Roman Czyborra has developed prototype Unicode fonts for the X Window System: see http://czyborra.com/unifont/ for details.

Frank da Cruz maintains useful character set tables in the Kermit Project for verifying correct display of fonts: http://www.columbia.edu/kermit/csettables.html

Markus Kuhn has developed prototype ISO10646-1 (Unicode) fonts for the X Window System; see http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html for details. He has also prepared a comprehensive tutorial on UTF-8 and Unicode: see http://www.cl.cam.ac.uk/~mgk25/unicode.html.

Microsoft maintains a comprehensive Web site on Unicode-related issues at http://www.microsoft.com/globaldev/

There is an open source initiative to develop C/C++ software for Unicode support: International Components for Unicode (ICU): http://oss.software.ibm.com/icu/

Interview with font designer Victor Gaultney on the design of the Gentium font for Unicode.

James Kass maintains a Web site with pointers to Unicode tables and other resources at http://home.att.net/~jameskass/

OpenI18N WG of the Free Standards Group Common Locale Data Repository V1.0

The Script Encoding Initiative at the Department of Linguistics, University of California, Berkeley http://www.linguistics.berkeley.edu/~dwanders/ works on encoding of minority scripts for eventual inclusion in Unicode.

A Quick Primer On Unicode and Software Internationalization Under Linux and UNIX: http://eyegene.ophthy.med.umich.edu/unicode/

Finally, I maintain an extensive, and frequently updated, bibliography of publications about Unicode.

Fonts and software resources for the Unicode Character Set

Table of contents

What is Unicode?

Who uses Unicode?

Fonts for the Unicode character set

Other Unicode resources and Web sites