Converting Unicode Characters

In our last newsletter, I explained how to find and replace Unicode characters, which I'm seeing more and more in electronic manuscripts that come into my hands for editing. The problem is that our shop does typesetting in QuarkXPress, which, at least as of version 5.0, won't import Unicode characters. (This is also true of several other typesetting programs.) For manuscripts using lots of Hebrew, Greek, or other special characters, this is a real problem.

Until a couple of weeks ago, I had no solution. Then, late one night, I was thinking about how to create a Word add-in that would search for formatting and replace it with user-defined tags. That's when it struck me: You can't search for all Unicode characters at once and replace them with something else, since there are thousands of them. But if you know which Unicode characters are being used in a document, you can certainly find and replace them with a combination of characters and tags that are meaningful in QuarkXPress.

To understand this, you have to know how special characters, such as Greek, are handled in QuarkXPress. They're just regular alphanumeric characters formatted in a special font. For example, to get alpha, beta, and gamma in QuarkXPress, you'd typically type a, b, and c and then format those characters with a Greek font:

a produces alpha

b produces beta

c produces gamma

So what you have to do in Microsoft Word is find a Unicode alpha and replace it with the letter a, tagged with an XPress Tag that indicates a Greek character style sheet in QuarkXPress. Here's how:

1. In Word, click the "Edit" menu.

2. Click "Replace."

3. In the "Find What" box, enter the following string, which tells Word to search for the Unicode character alpha:

^u945

You can learn more about finding and replacing Unicode characters here:

http://www.topica.com/lists/editorium/read/message.html?mid=1710421080

4. In the "Replace With" box, enter the character (a for alpha) and the surrounding XPress Tags you'll use to tell QuarkXPress to format the character with Greek:

<@Greek>a<@$p>

By the way, that's the standard format for XPress Tags that will create a character style sheet in QuarkXPress. The name "Greek" is arbitrary; call the style sheet whatever you'd like.

Now, to import the file into QuarkXPress:

1. In Word, save the file as a text document.

2. Open QuarkXPress and create a new file.

3. Click in the text box to make it active.

4. Click the "File" menu.

5. Click "Get Text."

6. Navigate to your text document.

7. Put a check in the box labeled "Include Style Sheets."

8. Click the "Open" button.

The file will be imported into QuarkXPress, and the XPress Tags you used will be imported as a character style sheet named "Greek." Now, in QuarkXPress, edit the character style sheet to use your Greek font. Presto! The character that used to be a Unicode alpha in Word will once again become an alpha in QuarkXPress.

For this to work, you have to know three things:

1. The Unicode numbers for the characters you want to convert. You can look up such numbers here:

http://www.hclrss.demon.co.uk/unicode/search.html

2. The font (such as Greek) you'll be using to produce special characters in QuarkXPress.

3. The "ordinary" character (such as "a") that the font uses to produce each special character (such as alpha).

Then you can record a macro in which you find and replace each Unicode character with the ordinary character surrounded by the XPress Tags. Then, the next time you need to convert a bunch of Greek or Hebrew, just run the macro.

Of course, recording such a macro--or a series of them for different languages--is error-prone and tedious. A better solution is to use our MegaReplacer program, for which you can create a script that looks like this, with the Unicode numbers on the left (of the pipe symbols) and the XPress Tags and characters on the right:

^u945|<@Greek>a<@$p>

^u946|<@Greek>b<@$p>

^u947|<@Greek>c<@$p>

MegaReplacer also has the advantage of batch processing, so you can run the script on a whole folder full of documents. And, of course, the scripts are easy to change as needed.

You can learn more about MegaReplacer here:

http://www.editorium.com/14843.htm

You might also want to use our QuarkConverter program to automatically insert additional XPress Tags for style and character formatting:

http://www.editorium.com/14846.htm

However you decide to work, you now have a way to convert Unicode characters to special characters for QuarkXPress or any other typesetting program that uses tags.

_________________________________________

READERS WRITE

Mark Pool (mark913@earthlink.net) wrote:

"I think you and your readers might be interested to know that Merriam-Webster now has a free toolbar. To read all about it and/or download it go to http://www.m-w.com/tools/toolbar/."

Thanks, Mark. Macintosh users should note that the toolbar works only with the Windows operating system. Sorry. But Windows users should find this to be a wonderful tool.

Mary C. Eberle wrote:

"Do you know if there is a way to convert the specialized keys on a typical Microsoft-friendly keyboard to do something useful? For example, I never launch programs from the start menu, at least not more than once. And if I needed to use the start menu, there is the trusty little mouse. Thus the start menu key is useless and even bothersome to me. But it would make a dandy key to run macros if I could redefine it. Do you know any tricks to make that key available to run macros in Word 97?"

I responded:

"Look in the Readers Write column (scroll down a ways) here for some possibilities: http://www.topica.com/lists/editorium/read/message.html?mid=1708382808."

Mary continued:

"Here is a mechanical hint that may be helpful to some readers: I put an aluminum cap over the CAPS LOCK key to make it nonoperational. The aluminum cap is made from the open-and-close spout on a box of dishwasher soap. The triangular sides slip down on either side of the key. The pointed ends need to be cut off a little bit at a time until the right height is achieved so that when one accidentally keys the cap on top of the CAPS LOCK key, the key doesn't press down. I glued the aluminum cap on with heavy-duty double-sided sticky tape, but if one sometimes needs CAPS LOCK, the gluing is not necessary.

"One other hint: I have written so many macros to use in my editing that many had to be assigned to hard-to-type key combinations. I recently purchased an X-keys auxiliary keyboard to which macros can be assigned. It has doubled my macro use and increased my productivity. Readers could check this product out at www.xkeys.com. I have even put the comma and colon on my X-keys keyboard because they often need to be inserted and are a pain because in using the regular keyboard for them, I have to take my hands off my mouse."

Thanks, Mary. Please note that the X-keys keyboard will work with both PC and Macintosh, as will Mary's aluminum cap. 🙂

_________________________________________

RESOURCES

If you've spent much time on email discussion lists that deal with Microsoft Word, Help authoring, or technical writing, you've probably noticed the brilliant, idiosyncratic posts of Steve Hudson, who has also contributed much to this newsletter. What you may not know is that now you can talk with Steve--live!--to get answers to your advanced questions about Microsoft Word, writing, document design, macros, templates, lists, master documents, documentation hierarchies, policies, standards, processes, graphics terminology, and much more. For more information, visit the Word Heretic's Church, here:

http://www.keen.com/memberpub/homepage.asp?user=The+Word+Heretic

Steve's time ain't cheap, but then, how cheap is it to spend hours of your time fighting a problem that Steve could probably fix in minutes? When you're having serious troubles with Word, it's nice to have a real expert available.

You may also want to check out Steve's blog (Web log), which features useful information about advanced Word topics, VBA, Help authoring, and Steve's customized macros and templates. Steve's colorful language is not always for the faint of heart, but there's lots of valuable information here:

blog.tdfa.com

Finding and Replacing Unicode Characters

I'm seeing more and more documents that use Unicode characters for all kinds of things--fractions, Greek, Hebrew--since these characters are so easy to use in Word 2000 and 2002. You can learn more about Unicode here:

http://www.topica.com/lists/editorium/read/message.html?mid=1709529895

Sometimes I need to find and replace these characters with something else. How to do so isn't readily apparent, but there are actually two different methods that will work.

Method 1: Unicode number.

You're probably aware that you can find ASCII characters using numeric codes. For example, to find an e with an acute accent, you could do this:

1. Click the "Edit" menu.

2. Click "Find."

3. In the "Find What" box, enter ^0233 (on a PC) or ^0142 (on a Mac).

4. Click the "Find Next" button.

You can learn more about this here:

http://www.topica.com/lists/editorium/read/message.html?mid=1704081834

The procedure for finding Unicode characters is similar, but you'd use a "u" instead of a "0" in front of the number, and of course you'd need to know the Unicode decimal number for the character. You can look up Unicode numbers at Alan Wood's Unicode Resources site here:

http://www.hclrss.demon.co.uk/unicode/search.html

For example, to find a small Greek alpha in Microsoft Word, you'd search for ^u945.

Method 2: Copy and paste.

If you can see an example of the character in your document (or insert one), you can actually copy the character and then paste it into the "Find What" box. Then just search as usual.

Replacing Text with Unicode Characters

Replacing text with Unicode characters can be a little trickier than finding them, as Word won't let you use a numeric code (like ^u945) in the Replace dialog's "Replace With" box. I've usually had success, however, in pasting the character into the "Replace With" box. If you can't do that with a certain character, you may be able to follow this procedure instead:

1. Find an example of the character in your document (or insert one).

2. Copy the character.

3. Click the "Edit" menu.

4. Click "Replace."

5. In the "Find What" box, enter the text you want to find.

6. In the "Replace With" box, enter ^c to tell Word you want to replace with the contents of the Clipboard--in other words, with the Unicode character you copied.

7. Click the "Replace All" button.

If you need to work with Unicode characters on a Macintosh, things get much tougher, but you'll find information about doing so here:

http://www.hclrss.demon.co.uk/unicode/utilities_fonts_mac.html#apple

http://www.hclrss.demon.co.uk/unicode/utilities_editors_mac.html

http://www.hclrss.demon.co.uk/unicode/utilities_editors_macosx.html

_________________________________________

READERS WRITE

Last week's newsletter bewailed the state of comments and revision tracking in Word 2002. Responding to my complaint that there is no way to make comments print as they did in earlier versions of Word, Erika Buky wrote, "It's not much of a workaround for people with only one computer and the current version of Word, but I understand (from the vendor who supports my organization's Word macro package) that you can import files with comments into a previous version of Word (97 or 2000), and the comments will print in the old, rational way. Still using W97/98 myself, I haven't been able to verify this."

I tried this, and it works just as Erika said.

After all of my grumbling, Meg Cox offered an alternative point of view:

Don't take my balloons!

I love the balloons. I used to have a terrible time working with tracked changes showing. It was too hard to follow the final version in the middle of all that mess. But if I didn't show changes, I would forget to toggle track changes back on when I needed to, and I'd wind up with untracked paragraphs. Everything's much easier with the balloons, and I think much clearer for the reader--even the comments as long as they stay on the same page as the text.

I agree that the balloons become less useful when the changes become denser. Word should indeed provide an easy-to-find alternative.

Nancyann Ropke (ropke.nancyann@leg.state.fl.us) wrote:

Woody's Office Watch has had several articles about comments and tracking in Word 2002.

Go to http://www.woodyswatch.com/office/archives.asp and search for "balloons"

Here are two of the articles I found.

http://www.woodyswatch.com/wowmm/archtemplate.asp?v3-n06

http://www.woodyswatch.com/wowmm/archtemplate.asp?v3-n02

Thanks to all for their comments and suggestions.

_________________________________________

RESOURCES

Titivillus Tools for Copy Editors and Those Who Employ Them is a Web site operated by Timothy DeVinney of Titivillus Editorial Services. The site has many helpful resources, including:

* A business plan for a freelance copy editor

* A checklist for copyediting agreements

* Style checklists

http://www.titivillus-editorial.com/

Check it out! You'll be glad you did.

Comments and Tracking in Word 2002

If you've started using Microsoft Word 2002, you've probably seen the little "balloons" that display your comments and tracked changes. In my opinion, these are pretty much useless in a professional environment. For example, if you get many deletions on a page, Word will abbreviate the balloon messages, so printing these for an author to review is of little help. Yes, you can print the changes separately (File > Print > Print what: > List of markup), but trying to compare this list with the document is cumbersome.

Online review isn't much better. An author can use the Reviewing toolbar to go from change to change or comment to comment in the Reviewing Pane, but that's not how real people read. I want to see the corrections and comments clearly marked inline--just as they were in previous versions of Word.

Good news: After mucking around in the bowels of the program, I've discovered a fix for revision tracking:

1. Click the "Tools" menu.

2. Click the "Track Changes" tab.

3. Under "Balloons," uncheck the box labeled "Use balloons in Print and Web Layout."

Wow, what an improvement! No more balloons, and revision tracking is handled inline the way it used to be. To print your document showing tracked changes, do this:

1. Click the "File" menu.

2. Click "Print."

3. Under "Print what:" select "Document showing markup."

Now for the bad news: There is no fix for comments--at least not that I can find. In previous versions of Word, each comment had an inline reference (like "[JML3]") and a corresponding reference at the beginning of the comment. That was a good system, easy to use and understand.

With Word 2002, these references have gone away, so it's now difficult to figure out what part of the text a comment refers to. You can move from comment to comment using the browser arrows at the bottom right of your screen, but that's a poor substitute. Even worse, there seems to be no way to print comments at all without enabling those stupid balloons. Microsoft, are you listening?

If you know of a way around this problem, please let me know. If not, you can always resort to typing coded inline comments [[like this one]] that can later be deleted with a wildcard Find and Replace:

Find what: [[*]]

Replace with: [nothing]

Maybe if we all wrote to Microsoft about this, they'd stop gumming up a perfectly useful word processor. Maybe we should send balloons.

_________________________________________

READERS WRITE

Brad Hurley (bhurley@sover.net) wrote:

I usually use bibliographic software for references (EndNote, which integrates nicely with Word), but occasionally I have to edit documents that use Word's endnotes and footnotes. Is there any way to insert footnotes or endnotes into text boxes? We frequently prepare documents with sidebars, which we create with text boxes, but there doesn't seem to be any way to add footnotes to them if we need to cite a reference. Maybe there's a better solution for creating sidebars than using text boxes?

I replied:

As you've already learned, text boxes don't support footnotes or endnotes. However, frames do.

So if you can use frames rather than text boxes, that should solve your problem. To get a frame in Word 2000, you have to click Tools > Macro > Macros and then select "Word commands" in the "Macros in:" dropdown list. Then click "InsertFrame" in the "Macro name:" box. Then click the "Run" button. Finally, use your mouse to draw the frame in your document.

Please note that this kind of frame is not to be confused with the Format > Frames command, which creates HTML frames for use in Web pages.

Wordmaster Steve Hudson wrote:

So, you went and installed Word 2000 and set the templates and wizards to Run All from My Computer. Now when you go File > New there is a plethora of tabs and templates, none of which you use anyway. So now you want to clean them out.

Step 1 - Uninstall the templates. Otherwise Word will keep on replacing them when you delete them! Change Start > Settings > Control Panel > Add / Remove Programs > Microsoft Office > Change > Add / Remove Features > Microsoft Word for Windows > Wizards and Templates to Not Available. Update Now > OK > Close.

Step 2 - Dump the following lines into a new text file and rename it Killer.bat. Double-click it to run it. It gets rid of the last few problem children for you.

%HomeDrive%

cd "%ProgramFiles%Microsoft OfficeOffice"

rmdir Broadcast /s /q

cd 1033

del Feedback.htm

del Thankyou.htm

Thanks to Brad for his question and to Steve for his cleanup procedure.

_________________________________________

RESOURCES

Expertise Publications features "articles, tip sheets, white papers to guide you through all aspects of using Microsoft Word." Especially if you're migrating from WordPerfect, you'll find some useful information here:

http://www.microsystems.com/publications.htm