Converting Unicode Characters

In our last newsletter, I explained how to find and replace Unicode characters, which I'm seeing more and more in electronic manuscripts that come into my hands for editing. The problem is that our shop does typesetting in QuarkXPress, which, at least as of version 5.0, won't import Unicode characters. (This is also true of several other typesetting programs.) For manuscripts using lots of Hebrew, Greek, or other special characters, this is a real problem.

Until a couple of weeks ago, I had no solution. Then, late one night, I was thinking about how to create a Word add-in that would search for formatting and replace it with user-defined tags. That's when it struck me: You can't search for all Unicode characters at once and replace them with something else, since there are thousands of them. But if you know which Unicode characters are being used in a document, you can certainly find and replace them with a combination of characters and tags that are meaningful in QuarkXPress.

To understand this, you have to know how special characters, such as Greek, are handled in QuarkXPress. They're just regular alphanumeric characters formatted in a special font. For example, to get alpha, beta, and gamma in QuarkXPress, you'd typically type a, b, and c and then format those characters with a Greek font:

a produces alpha

b produces beta

c produces gamma

So what you have to do in Microsoft Word is find a Unicode alpha and replace it with the letter a, tagged with an XPress Tag that indicates a Greek character style sheet in QuarkXPress. Here's how:

1. In Word, click the "Edit" menu.

2. Click "Replace."

3. In the "Find What" box, enter the following string, which tells Word to search for the Unicode character alpha:

^u945

You can learn more about finding and replacing Unicode characters here:

http://www.topica.com/lists/editorium/read/message.html?mid=1710421080

4. In the "Replace With" box, enter the character (a for alpha) and the surrounding XPress Tags you'll use to tell QuarkXPress to format the character with Greek:

<@Greek>a<@$p>

By the way, that's the standard format for XPress Tags that will create a character style sheet in QuarkXPress. The name "Greek" is arbitrary; call the style sheet whatever you'd like.

Now, to import the file into QuarkXPress:

1. In Word, save the file as a text document.

2. Open QuarkXPress and create a new file.

3. Click in the text box to make it active.

4. Click the "File" menu.

5. Click "Get Text."

6. Navigate to your text document.

7. Put a check in the box labeled "Include Style Sheets."

8. Click the "Open" button.

The file will be imported into QuarkXPress, and the XPress Tags you used will be imported as a character style sheet named "Greek." Now, in QuarkXPress, edit the character style sheet to use your Greek font. Presto! The character that used to be a Unicode alpha in Word will once again become an alpha in QuarkXPress.

For this to work, you have to know three things:

1. The Unicode numbers for the characters you want to convert. You can look up such numbers here:

http://www.hclrss.demon.co.uk/unicode/search.html

2. The font (such as Greek) you'll be using to produce special characters in QuarkXPress.

3. The "ordinary" character (such as "a") that the font uses to produce each special character (such as alpha).

Then you can record a macro in which you find and replace each Unicode character with the ordinary character surrounded by the XPress Tags. Then, the next time you need to convert a bunch of Greek or Hebrew, just run the macro.

Of course, recording such a macro--or a series of them for different languages--is error-prone and tedious. A better solution is to use our MegaReplacer program, for which you can create a script that looks like this, with the Unicode numbers on the left (of the pipe symbols) and the XPress Tags and characters on the right:

^u945|<@Greek>a<@$p>

^u946|<@Greek>b<@$p>

^u947|<@Greek>c<@$p>

MegaReplacer also has the advantage of batch processing, so you can run the script on a whole folder full of documents. And, of course, the scripts are easy to change as needed.

You can learn more about MegaReplacer here:

http://www.editorium.com/14843.htm

You might also want to use our QuarkConverter program to automatically insert additional XPress Tags for style and character formatting:

http://www.editorium.com/14846.htm

However you decide to work, you now have a way to convert Unicode characters to special characters for QuarkXPress or any other typesetting program that uses tags.

_________________________________________

READERS WRITE

Mark Pool (mark913@earthlink.net) wrote:

"I think you and your readers might be interested to know that Merriam-Webster now has a free toolbar. To read all about it and/or download it go to http://www.m-w.com/tools/toolbar/."

Thanks, Mark. Macintosh users should note that the toolbar works only with the Windows operating system. Sorry. But Windows users should find this to be a wonderful tool.

Mary C. Eberle wrote:

"Do you know if there is a way to convert the specialized keys on a typical Microsoft-friendly keyboard to do something useful? For example, I never launch programs from the start menu, at least not more than once. And if I needed to use the start menu, there is the trusty little mouse. Thus the start menu key is useless and even bothersome to me. But it would make a dandy key to run macros if I could redefine it. Do you know any tricks to make that key available to run macros in Word 97?"

I responded:

"Look in the Readers Write column (scroll down a ways) here for some possibilities: http://www.topica.com/lists/editorium/read/message.html?mid=1708382808."

Mary continued:

"Here is a mechanical hint that may be helpful to some readers: I put an aluminum cap over the CAPS LOCK key to make it nonoperational. The aluminum cap is made from the open-and-close spout on a box of dishwasher soap. The triangular sides slip down on either side of the key. The pointed ends need to be cut off a little bit at a time until the right height is achieved so that when one accidentally keys the cap on top of the CAPS LOCK key, the key doesn't press down. I glued the aluminum cap on with heavy-duty double-sided sticky tape, but if one sometimes needs CAPS LOCK, the gluing is not necessary.

"One other hint: I have written so many macros to use in my editing that many had to be assigned to hard-to-type key combinations. I recently purchased an X-keys auxiliary keyboard to which macros can be assigned. It has doubled my macro use and increased my productivity. Readers could check this product out at www.xkeys.com. I have even put the comma and colon on my X-keys keyboard because they often need to be inserted and are a pain because in using the regular keyboard for them, I have to take my hands off my mouse."

Thanks, Mary. Please note that the X-keys keyboard will work with both PC and Macintosh, as will Mary's aluminum cap. 🙂

_________________________________________

RESOURCES

If you've spent much time on email discussion lists that deal with Microsoft Word, Help authoring, or technical writing, you've probably noticed the brilliant, idiosyncratic posts of Steve Hudson, who has also contributed much to this newsletter. What you may not know is that now you can talk with Steve--live!--to get answers to your advanced questions about Microsoft Word, writing, document design, macros, templates, lists, master documents, documentation hierarchies, policies, standards, processes, graphics terminology, and much more. For more information, visit the Word Heretic's Church, here:

http://www.keen.com/memberpub/homepage.asp?user=The+Word+Heretic

Steve's time ain't cheap, but then, how cheap is it to spend hours of your time fighting a problem that Steve could probably fix in minutes? When you're having serious troubles with Word, it's nice to have a real expert available.

You may also want to check out Steve's blog (Web log), which features useful information about advanced Word topics, VBA, Help authoring, and Steve's customized macros and templates. Steve's colorful language is not always for the faint of heart, but there's lots of valuable information here:

blog.tdfa.com

This entry was posted in Editing. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

You must be logged in to post a comment.

  • The Fine Print

    Thanks for reading Editorium Update (ISSN 1534-1283), published by:

    The EDITORIUM, LLC
    http://www.editorium.com

    Articles © on date of publication by the Editorium. All rights reserved. Editorium Update and Editorium are trademarks of the Editorium.

    You may forward copies of Editorium Update to others (but not charge for it) and print or store it for your personal use. Any other broadcast, publication, retransmission, copying, or storage, without written permission from the Editorium, is strictly prohibited. If you’re interested in reprinting one of our articles, please send an email message to editor@editorium.com

    Editorium Update is provided for informational purposes only and without a warranty of any kind, either express or implied, including but not limited to implied warranties of merchantability, fitness for a particular purpose, and freedom from infringement. The user (you) assumes the entire risk as to the accuracy and use of this document.

    The Editorium is not affiliated with Microsoft Corporation or any other entity.

    We do not sell, rent, or give our subscriber list to anyone. Period.

    If you’d like to subscribe, please enter your name and email address below. We publish the newsletter once a week, and on rare occasions we may send an important announcement. We never, ever send spam. Thank you for signing up!