Lyonizing Word: From Easy to Impossible — Three Variations on a Theme

by Jack Lyon

Rich Adin just keeps on escalating the difficulty of his requests. That’s okay, because I appreciate a good challenge. Here’s his latest:

Okay, Jack, you solved the problem of reducing the number of authors from more than three down to three.

To see what Rich is talking about, please see my previous posts here: Lyonizing Word: We Can Do This the Easy Way, or . . . and Lyonizing Word: The Easy Way, Not So Easy.

Rich continues:

But there is a caveat: the list of names needs to end with “et al:”. So let me pose three more variations.

Three?! Oh, all right. Here we go:

Variation 1

How do I handle instances where the ending is punctuation other than “et al:”? For example, it could be a different punctuation mark than the colon or it could end with an author name and not “et al” (e.g., “Lyon J, Adin R, Carter TO, Jackson TT, Doe J, Smith K, Winger W:” or “Lyon J, Adin R, Carter TO, Jackson TT, Doe J, Smith K, Winger W, Hoffnagle TTP.”)

How do we handle instances where the ending is punctuation other than “et al:”? Here are Rich’s examples, all laid out for our inspection:

Lyon J, Adin R, Carter TO, Jackson TT, Doe J, Smith K, Winger W:

Lyon J, Adin R, Carter TO, Jackson TT, Doe J, Smith K, Winger W, Hoffnagle TTP.

As usual, the key is to find the “handle,” the unique elements we can grab to carry out our search. (For more on this, please see my article “What’s Your Handle?” [2003] at the Editorium Update.)

In Rich’s examples, the “handles” would have to be the colon that ends the first entry and the period that ends the second. Let’s try modifying the wildcard string from the previous post for Lyonizing Word:

([!^013]@, [!^013]@, [!^013]@, )[!^013]@([:.])

Here’s what that means:

Find any characters except a carriage return: [!^013]
repeated any number of times: @
followed by a comma
followed by a space
repeated three times
and enclosed in parentheses to form a “group.”
Then find any character except a carriage return: [!^013]
repeated any number of times: @
followed by [:.] (specifying a colon or a period) in parentheses to form a group.

And we can use the following in the “Replace With” box:

12

Here’s what that means:

Replace everything that was found
with the text represented by group 1: 1
followed by the text represented by group 2: 2

But does that actually work? Well, sort of, Here’s what we get:

Lyon J, Adin R, Carter TO, :
Lyon J, Adin R, Carter TO, .

Maybe that’s close enough, as it would now be an easy matter to search for comma space colon and replace it with a colon, and to search for comma space period and replace it with a period. But if we want to refine our search string even further, we could use this:

([!^013]@, [!^013]@, [!^013]@), [!^013]@([:.])

Here, we’ve placed the comma and space following the third name outside the parenthetical group, so they’re not included when the group is replaced by /1. That actually solves the problem, if you want to get precise, giving us a result like this:

Lyon J, Adin R, Carter TO:
Lyon J, Adin R, Carter TO.

Variation 2

Rich wrote:

How can I revise the string to work even if there is no consistency in punctuation of names? For example, suppose the names are: “Lyon, J, Adin R, Carter T.O., Jackson TT, Doe, J.; Smith K; Winger, W; Hoffnagle TTP.”

As given, this can’t be done. Why? Because we’ve lost the uniqueness of the comma “handles” that separate the names. For example, instead of this —

Lyon J,

— we have this:

Lyon, J,

And instead of this —

Smith K,

— we have this:

Smith K;

So again, as given, we can’t fulfill Rich’s request. But can we change the “as given”? Why, yes, we can!

We can search for a lowercase letter followed by a comma (at the end of a last name) and replace it with just the lowercase letter (and no comma):

Find what: ([a-z]),
Replace with: 1

We can search for a semicolon (which sometimes follows initials) and replace it with a comma:

Find what: ;
Replace with: ,

Then we can use the same wildcard string we used earlier to fulfill Rich’s request:

Find what: ([!^013]@, [!^013]@, [!^013]@), [!^013]@([:.])
Replace with: 12

You may be wondering if these wildcard strings will affect the article titles and journal names and not just the author names. The answer is, it depends. I’m assuming, for example, that the article titles and journal names don’t include commas (just for purposes of illustration). But if they do, you may have to get creative. Let’s take this as an example:

Levy, D, Ehret G, Rice K, Verwoert G, Launer L, Dehghan A, Glazer N, Morrison A, Johnson A, Aspelund T, Ganesh S, Chasman D: Genome-wide association study of blood pressure, stress, and hypertension. Nature 2009, 41(6): 677-687.

See that comma after “Levy”? Above, we got rid of it with the following strings:

Find what: ([a-z]),
Replace with: 1

But notice that this will also remove the commas after “pressure” and “stress” in the article title, which we don’t want to do. The solution, again, comes down to handles. What do we have that sets off the article title and journal name? In this example, they’re preceded by the colon after the author names (“Chasman D:”) and followed by a carriage return (at the end of the citation). So here’s a rather sneaky solution: Search for a colon followed by anything that isn’t a carriage return until you come to a carriage return. Then replace whatever was found with itself (^&) formatted as Hidden:

Find what: :[!^013]@^013
Replace with (use Hidden formatting): ^&

If you don’t know how to replace using formatting, here’s the secret:

1. Put your cursor in the “Replace with” box.
2. Click the “More” button if it’s showing.
3. Click the “Format” button on the bottom left.
4. Click “Font.”
5. Put a check in box labeled “Hidden.”
6. Click the “OK” button.

Notice that you can replace with all kinds of formatting: styles, paragraph alignment, and so on. You can also use formatting in the “Find what” box! This is really powerful stuff, and if you didn’t know about it before, now you can add it to your bag of tricks.

At any rate, with the article titles and journal names formatted as Hidden, you can make sure they actually are hidden by clicking the “Show/Hide” button (with the pilcrow icon: ¶) on Word’s “Home” tab. Then run your find and replace to remove commas from last names:

Find what: ([a-z]),
Replace with: 1

Finally, unhide the article titles and journal names (after using “Show/Hide” to display them):

Find what: (Hidden formatting)
Replace with: (Not Hidden formatting)

At that point, the commas will be gone from the authors’ last names but preserved in the article titles and journal names.

By the way, if you’re working on a Macintosh, you’ll find that Word doesn’t recognize the standard code for a carriage return (^013) while searching with wildcards. But never fear: you can still do what you need by “escaping” the code with a backslash and treating it as a range using square brackets. In other words, use this:

[ˆ013]

To specify not a carriage return, use the following:

[!ˆ013]

Variation 3

Rich wrote:

How can I adapt the wildcard string to delete those in excess of a certain number? For example, I have one client who wants up to ten author names listed and “et al” used only for names eleven and following. I would like to specify how many names I want retained and replace the excess with “et al.” For example, if there are fifteen names, delete the last five if ten are okay and replace them with “et al.”

Theoretically, we could do that as long as there’s a “handle” that marks the end of the names. Let’s take this example:

Levy D, Ehret G, Rice K, Verwoert G, Launer L, Dehghan A, Glazer N, Morrison A, Johnson A, Aspelund T, Ganesh S, Chasman D: Genome-wide association study of blood pressure and hypertension. Nature 2009, 41(6): 677-687.

There are actually twelve names there, so we want to keep the first ten and replace the last two with “et al.” What’s our handle? The colon after the last name (“Chasman D:”) and before the article’s title. So let’s try an expansion of the wildcard search string we used in the previous post for Lyonizing Word. Instead of grouping three comma-separated names, we’ll group ten:

Find what: ([!^013]@, [!^013]@, [!^013]@, [!^013]@, [!^013]@, [!^013]@, [!^013]@, [!^013]@, [!^013]@, [!^013]@,)[!^013]@(:)
Replace with: 1 et al.2

That would work if Word could handle it. But if you try it, Word will complain: “The Find What text contains a Pattern Match expression which is too complex.” So now what? Honestly, I’m not sure. I tried several other possibilities, none of which were successful. So if you, Gentle Reader, have any ideas about how to accomplish this seemingly impossible feat, I’d love to hear them.

Wildcard searching can’t do everything, but it can do an awful lot. As I’ve said before, after all these years of editing, wildcard searching is the tool I rely on the most. I encourage you to invest the time needed to learn to use this tool, which will repay your efforts many times over. A good place to start is my free paper “Advanced Find and Replace in Microsoft Word.”

I hope you’ll also watch for my forthcoming Wildcard Cookbook for Microsoft Word. I’m still trying to find more real-life examples for the book, so if you have some particularly sticky problems that might be solved using a wildcard search, I hope you’ll send them my way. Maybe I can save you some work and at the same time figure out solutions that will help others in the future. Thanks for your help!

Jack Lyon (editor@editorium.com) owns and operates the Editorium, which provides macros and information to help editors and publishers do mundane tasks quickly and efficiently. He is the author of Microsoft Word for Publishing Professionals and of Macro Cookbook for Microsoft Word. Both books will help you learn more about macros and how to use them.

For other Lyonizing Word essays at An American Editor, Lyonizing Word at AAE.