Unicode

In the beginning was ASCII, and ASCII was limited–128 characters wasn’t enough. So Microsoft extended it to 256–still not enough. True, you could now access “foreign-language” and other special characters by using “code pages” with different fonts in Microsoft Word. If you’ve clicked Insert > Symbol and then changed the font on the drop-down list in the Symbol dialog, you’ve seen how this works: the same character “position” (or number) often displays a different character in different fonts.

But what if you want to use special characters–*any* special characters–in the *same* font as your regular text? That’s what Unicode is all about. As the Unicode Web site explains, “Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.” How many characters? Potentially more than a million. So whether you’re working with Greek or Gothic, Klingon or Korean, Unicode is for you.

Unicode also includes special typographical characters, such as hair spaces, thin spaces, and zero-width spaces, which we made by hand in last week’s newsletter. But now you don’t have to make them; using Unicode, you can get the real thing.

Of course, there is a catch. Using Unicode requires three things:

1. An operating system that supports it.

2. A program (application) that supports it.

3. A Unicode font that includes the characters you need (not all of them will, although in theory they should).

There’s a list of such items here:

http://www.unicode.org/unicode/onlinedat/products.html

But I’ll make it easy for you:

1. Common operating systems include Microsoft Windows 2000, NT, and XP, and Macintosh OS 9.2, X, 10.1, and X Server.

2. Versions of Microsoft Word include 97, 2000, and 2002 for Windows, and 98, 2001, and X for Macintosh. However, the Mac versions (and operating systems) may require a “Language Kit,” which you can learn more about here:

http://www.hclrss.demon.co.uk/unicode/utilities_fonts.html#apple

3. Unicode fonts are rapidly becoming available. There’s a great list here, and many of the fonts are free:

http://www.hclrss.demon.co.uk/unicode/fonts.html#general

Once you’ve installed a Unicode font, you can insert its special characters with the good old Insert > Symbol menu (be sure to select the Unicode font in the dropdown Font list).

You can also insert a character with the keyboard (in Word 2000 and higher) if you know its Unicode number. To do so, be sure a Unicode font is selected (Format > Font); then type the number into your document and press ALT + X. For example, let’s say we need a zero-width space in Word 2000. The Unicode number for such a space is 200B. So all we have to do is type 200B into our document and press ALT + X. Presto!

You can learn more about using Unicode characters in Word here:

http://www.hclrss.demon.co.uk/unicode/utilities_editors.html#word97

For additional information on Word 2000 and 2002, scroll down past the Word 97 information (which is also relevant for the later versions).

If you need to look up the number of a Unicode character, you can do so here:

http://www.hclrss.demon.co.uk/unicode/search.html

If you just want to insert typographic spaces, here are the Unicode numbers you need:

Nonbreaking space: 00A0

En space: 2002

Em space: 2003

Three-per-em space: 2004

Four-per-em space: 2005

Six-per-em space: 2006

Figure space: 2007

Punctuation space: 2008

Thin space: 2009

Hair space: 200A

Zero-width space: 200B

And you’ll find additional information on spaces here:

http://www.microsoft.com/typography/developers/fdsspec/spaces.htm

With Unicode, the world (or at least its scripts) is your oyster.

_________________________________________

RESOURCES

For a dazzling array of Unicode information, see Alan Wood’s Unicode Resources site:

http://www.hclrss.demon.co.uk/unicode/index.html

Check out the official Unicode site here:

The official site: http://www.unicode.org

For online samples of interesting characters, see this page:

http://home.att.net/~jameskass/scriptlinks.htm

For a free Word add-in program to help you insert Unicode characters, go here:

http://hem.fyristorg.com/dahloe/uniqoder/

For information on artificial scripts, go here:

http://www.evertype.com/standards/csur/index.html

If you’re a Tolkien fan, you might be interested in the Tengwar encoding proposal:

http://www.evertype.com/standards/csur/tengwar.html and in Tolkien fonts (but not necessarily Unicode):

http://www.geocities.com/TimesSquare/4948/

http://babel.uoregon.edu/yamada/fonts/tolkien.html

and in the Resources for Tolkien Linguistics site:

http://www.elvish.org/resources.html

And if you’re actually interested in Klingon, here’s the scoop:

http://www.evertype.com/standards/csur/klingon.html

This entry was posted in Editing. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

You must be logged in to post a comment.

  • The Fine Print

    Thanks for reading Editorium Update (ISSN 1534-1283), published by:

    The EDITORIUM, LLC
    http://www.editorium.com

    Articles © on date of publication by the Editorium. All rights reserved. Editorium Update and Editorium are trademarks of the Editorium.

    You may forward copies of Editorium Update to others (but not charge for it) and print or store it for your personal use. Any other broadcast, publication, retransmission, copying, or storage, without written permission from the Editorium, is strictly prohibited. If you’re interested in reprinting one of our articles, please send an email message to editor@editorium.com

    Editorium Update is provided for informational purposes only and without a warranty of any kind, either express or implied, including but not limited to implied warranties of merchantability, fitness for a particular purpose, and freedom from infringement. The user (you) assumes the entire risk as to the accuracy and use of this document.

    The Editorium is not affiliated with Microsoft Corporation or any other entity.

    We do not sell, rent, or give our subscriber list to anyone.