in Computers and Software, Contributor Article, Lyonizing Word

Lyonizing Word: Using the “Find What Expression” Wildcard

by Jack Lyon

Rich Adin recently sent me an interesting challenge. He was using his EditTools Journal feature to mark journal titles in references. The power behind that useful tool comes from lists of incorrectly styled references with corresponding correctly styled references. He creates a separate list for each reference style. The list he sent me was for AMA style, in which the reference uses the PubMed abbreviation followed by a period. It looks like something like this:

A Gesamte Exp Med, | cyan -> Z Gesamte Exp Med.
A Gesamte Exp Med. | cyan -> Z Gesamte Exp Med.
A JR | cyan -> AJR Am J Roentgenol.
A M A Arch Ind Hyg Occup Med. | green
A of LTC | cyan -> Ann Longterm Care.
A of LTC, | cyan -> Ann Longterm Care.
A of LTC. | cyan -> Ann Longterm Care.
A&D | cyan -> Aging Dis.
A&D, | cyan -> Aging Dis.
A&D. | cyan -> Aging Dis.
A. M. A. Arch. Derm | cyan -> AMA Arch Derm.
A. M. A. Arch. Derm, | cyan -> AMA Arch Derm.
A. M. A. Arch. Derm. | cyan -> AMA Arch Derm.

The text to the left of the pipe (|) is how the entry might (incorrectly) appear in the references supplied by the author; the entry to the right is how it should appear. Each entry includes a color, either cyan or green, which tells the program to use that color in highlighting the reference.

Rich knew that some of the entries included duplicates, like this:

Arch Intern Med. | cyan -> Arch Intern Med.

In other words, the item on the left was identical to the item on the right, which meant that it shouldn’t be marked. That also meant the entry didn’t need to be on the list at all. But the real problem was that Rich’s reference list included more than 117,000 entries!

Rich’s challenge? Use wildcard find and replace to remove such entries, thus shortening the list and preventing unnecessary marking.

First, let’s look at that entry again to see what we might need to do:

Arch Intern Med. | cyan -> Arch Intern Med.

There’s a pipe symbol (|) in the middle, which gives us something to differentiate the left side of the entry from the right side of the entry. So we might set up the first part of our wildcard string to look like this:

([!^013]@) |

That tells Word to find any character except a carriage return, an unspecified number of times, until it comes to a space followed by a pipe symbol.

The wildcard for a carriage return is:

^013

The wildcard for “except” is:

!

And we have to put both of those in square brackets so Word knows that’s a set of characters. (After all, [!^013] finds any character, no matter what it is, unless it’s a carriage return.)

The wildcard for “an unspecified number of times” is:

@

Finally, we have to put all of that into a “group” by enclosing it with parentheses. And that’s important. You’ll see why in a minute.

Testing that part of our search string, we see that, yes, indeed, it finds the following:

Arch Intern Med. |

In fact, it finds the beginning of each entry, which is just what we want.

Now let’s look at the right side of our entry:

 cyan -> Arch Intern Med.

You can’t see it here, but there’s a space in front of “cyan” — the space that follows the pipe symbol. So we need to include that space in our search string, along with the word “cyan” (in the following examples, I use [space] to represent a space so you can see it; [space] should not actually be entered; use a real space created by pressing the space bar):

[space]cyan

There’s also a space after cyan, so we’ll need to include that as well.

[space]cyan[space]

That needs to be followed by a hyphen, a right angle bracket, and yet another space, like this:

[space]cyan[space]->[space]

But now you may be wondering why I put a backslash in front of the angle bracket. It’s because the angle bracket is itself a wildcard (a subject for another day), so we need to tell Word we’re using it as an actual character, which is what the backslash does.

Finally, the rest of our search string looks like this:

1^013

This part of the string —

1

— is the “Find What Expression” wildcard, which is what this article is about, and it certainly took us a long time to get to it!

Remember back when we grouped the very first part of our search string in parentheses?

([!^013]@)

That “group” is the “expression” that the 1 wildcard represents. In algebraic terms:

1 = ([!^013]@)

And that means 1 will find whatever is found by the ([!^013]@) expression, which, my friend, is extremely cool, because it will allow us to weed out the duplicate entries on our reference list—entries like this:

Arch Intern Med. | cyan -> Arch Intern Med.

Now, for the first time, let’s look at our entire search string:

([!^013]@) | cyan -> 1^013

By now, you probably understand this quite well. The string finds any characters except a carriage return until it comes to a space and a pipe symbol; then it finds a space, the word “cyan,” and another space, followed by a hyphen, a right angle bracket, and a space. Finally (and most importantly), it finds whatever was found by the parenthetical group, followed by a carriage return.

Now we simply need to make sure that Word’s “Replace with” box is empty and click “Replace All.” All of those unnecessary entries will be deleted. (We’ll need to repeat with “green” for the entries that don’t include “cyan.”)

Which would you rather do: Find and delete such entries manually (with just 117,000 to look through) or have Word do it automatically?

That’s the power of the “Find What Expression” wildcard. In future articles, I’ll show you more uses for this wonderful tool, along with other Word wildcards.

Jack Lyon (editor@editorium.com) owns and operates the Editorium, which provides macros and information to help editors and publishers do mundane tasks quickly and efficiently. He is the author of Microsoft Word for Publishing Professionals, Wildcard Cookbook for Microsoft Word, and of Macro Cookbook for Microsoft Word. Both books will help you learn more about macros and how to use them.