We Can Do This the Easy Way,
or We Can Do This the Hard Way
by Jack Lyon
American Editor Rich Adin called me recently with a puzzle. He was editing a list of citations that looked like this:
Lyon J, Adin R, Poole L, Brenner E, et al: blah blah blah.
But his client wanted the citations to look like this:
Lyon J, Adin R, Poole L, et al: blah blah blah.
In other words, many of the citations included one author name too many; the client wanted a limit of three rather than four. And there were hundreds of citations. Rich really didn’t want to remove the superfluous names by hand; it would have taken hours to do, and hours are money. And so, Rich queried, “Is there a way to remove the fourth name automatically?”
There’s nearly always a way. Rich had already tried using a wildcard search, but without success. Microsoft Word kept telling him, “The Find What pattern contains a Pattern Match expression which is too complex.”
The Too-Complex Find What
I’m not sure what wildcard search Rich tried to use, but it might have looked like this:
Find what:
([A-Z][a-z]@ [A-Z], )([A-Z][a-z]@ [A-Z], )([A-Z][a-z]@ [A-Z], )([A-Z][a-z]@ [A-Z], )(et al:)
Replace with:
1235
That’s definitely too complex for Word to handle. Here’s what it means:
Find a capital letter ([A-Z])
followed by a lowercase letter ([a-z])
repeated any number of times (@)
followed by a space
followed by a capital letter ([A-Z])
followed by a comma
followed by a space
with all of that in parentheses to form a “group.”
All of that is repeated three more times, then followed by “et al:” in parentheses to form a group.
The “Replace with” string tells Word to replace what it finds with the contents of groups 1, 2, 3, and 5 — in other words, with the first three names followed by “et al:”.
What’s the Handle?
If Word could handle it, that should work. But Word can’t handle it, so we’ll need to simplify. So we ask ourselves, “What, besides letters, do all of the names have in common?” In other words, “What’s the handle? What can we grab onto?” Well, that’s easy — each name is followed by a comma and a space. That’s our handle!
(For more on this, please see my article "What’s Your Handle?" (2003) at the Editorium Update.)
The Find That Works
The handle means we can simplify our wildcard search string to something like this:
Find what:
([!^013]@, [!^013]@, [!^013]@, )[!^013]@, (et al:)
Replace with:
12
Here’s what that means:
Find any characters except a carriage return ([!^013])
repeated any number of times (@)
followed by a comma
followed by a space
with all of that repeated three times
and enclosed in parentheses to form a “group.”
Then it’s repeated one more time, ungrouped
and followed by “et al:” in parentheses to form a group.
The “Replace with” string tells Word to replace what it finds with the contents of groups 1 and 2 — in other words, with the first three names (group 1) followed by “et al:” (group 2). The fourth name is simply ignored.
To Group or Not to Group Using Parens
Rich ran the new find and replace, then replied, “Thanks, Jack, that works like a charm. Why isn’t the second ‘group’ grouped, that is, in parentheses? I thought that was necessary.”
I replied, “No, it’s not necessary. You group only the items that you want to reference (by 1, 2, etc.) in the ‘Replace with’ box. You could group the other item, in which case you would use ‘13’ in the ‘Replace with’ box. But there’s no need to do so.”
Note that this method of finding the names offers another advantage. Not only will it find names that look like this:
Lyon J,
it will also find names that look like this:
Lyon JM,
or even this:
Lyon JMQ
It will even find names like this:
Thaler-Carter Ruth,
or this:
Harrison G.B.H.,
In fact, it will find anything (except a carriage return) followed by a comma and a space.
Why the Carriage Return?
“Why,” you may be wondering, “specify anything but a carriage return? Why not specify letters instead?” Well, we could have done that, using something like this:
Find what:
([A-z ]@, [A-z ]@, [A-z ]@, )[A-z ]@, (et al:)
Replace with:
12
That means:
any capital or lowercase letter or space ([A-z ])
repeated any number of times (@)
followed by a comma
followed by a space
And so on.
Such a wildcard string would find names like this:
Lyon J,
but not this:
Thaler-Carter R,
Yes, we could add a hyphen to our string, but then we start to wonder about other characters we might need to include, and then things get complicated again. And besides, it’s true that we don’t want to include carriage returns in our search, so it makes sense to exclude them. If we tried to simplify too far, we might use this:
Find what:
(*, *, *, )*, (et al:)
Replace with:
12
The problem with using the asterisk wildcard (*) is that it finds any character any number of times, including tabs, spaces, carriage returns, and everything else you can think of. Sometimes that’s useful, but more often it just leads to confusion. We want to keep things simple but not too simple.
Why Wildcard
To return to our original problem: Rich could have removed all those extra names one at a time, by hand, which is doing it the hard way and eats into the profit line — remember that time is money. Microsoft Word includes powerful tools for doing things the easy way, so why not learn them and use them? If you’ve read this far, you’re doing that, so congratulations.
If you’d like to learn more about how to use wildcard searches, you can download my free paper “Advanced Find and Replace in Microsoft Word.” Working through the paper requires some thought and effort, but the payoff is huge.
Coming Soon
I hope you’ll watch for my forthcoming Wildcard Cookbook for Microsoft Word. I’m still trying to find more real-life examples for the book, so if you have some particularly sticky problems that might be solved using a wildcard search, I hope you’ll send them my way. Maybe I can save you some work and at the same time figure out solutions that will help others in the future. Thanks for your help!
For EditTools Users
If you are a user of EditTools, you can manually create the find and replace strings in the Wildcard Find & Replace macro and then save the macro for future use. However, to do so you need to enter the Find string slightly differently:
Find Field #1: [!^013]@, [!^013]@, [!^013]@,
Find Field #2: [!^013]@,
Find Field #3: et al:
Note that you omit the parens for grouping because EditTools automatically inserts them, which means that you break the string into its group components. (IMPORTANT: Be sure to include in Find Fields 1 and 2 the ending space, i.e., the space following the final comma, which is not visible above.)
Because EditTools treats each of the three fields as a group, your Replace string is:
Replace Field #1: 1
Replace Field #2: 3
After manually entering the information in each of the fields, click Add to WFR Dataset and save this macro for future use. Next time you need it, just click Retrieve from WFR Dataset, retrieve this string, and run it. That is one of the advantages to using EditTools' Wildcard Find & Replace — you can write a wildcard macro once and reuse it as many times as you need without having to recreate the macro each time.
Jack Lyon (editor@editorium.com) owns and operates the Editorium, which provides macros and information to help editors and publishers do mundane tasks quickly and efficiently. He is the author of Microsoft Word for Publishing Professionals and of Macro Cookbook for Microsoft Word. Both books will help you learn more about macros and how to use them.
Looking for a Deal?
You can buy EditTools in a package with PerfectIt and Editor's Toolkit at a special savings of $78 off the price if bought individually. To purchase the package at the special deal price, click Editor's Toolkit Ultimate.