Lyonizing Word: From Easy to Impossible — Three Variations on a Theme

by Jack Lyon

Rich Adin just keeps on escalating the difficulty of his requests. That’s okay, because I appreciate a good challenge. Here’s his latest:

Okay, Jack, you solved the problem of reducing the number of authors from more than three down to three.

To see what Rich is talking about, please see my previous posts here: Lyonizing Word: We Can Do This the Easy Way, or . . . and Lyonizing Word: The Easy Way, Not So Easy.

Rich continues:

But there is a caveat: the list of names needs to end with “et al:”. So let me pose three more variations.

Three?! Oh, all right. Here we go:

Variation 1

How do I handle instances where the ending is punctuation other than “et al:”? For example, it could be a different punctuation mark than the colon or it could end with an author name and not “et al” (e.g., “Lyon J, Adin R, Carter TO, Jackson TT, Doe J, Smith K, Winger W:” or “Lyon J, Adin R, Carter TO, Jackson TT, Doe J, Smith K, Winger W, Hoffnagle TTP.”)

How do we handle instances where the ending is punctuation other than “et al:”? Here are Rich’s examples, all laid out for our inspection:

Lyon J, Adin R, Carter TO, Jackson TT, Doe J, Smith K, Winger W:

Lyon J, Adin R, Carter TO, Jackson TT, Doe J, Smith K, Winger W, Hoffnagle TTP.

As usual, the key is to find the “handle,” the unique elements we can grab to carry out our search. (For more on this, please see my article “What’s Your Handle?” [2003] at the Editorium Update.)

In Rich’s examples, the “handles” would have to be the colon that ends the first entry and the period that ends the second. Let’s try modifying the wildcard string from the previous post for Lyonizing Word:

([!^013]@, [!^013]@, [!^013]@, )[!^013]@([:.])

Here’s what that means:

Find any characters except a carriage return: [!^013]
repeated any number of times: @
followed by a comma
followed by a space
repeated three times
and enclosed in parentheses to form a “group.”
Then find any character except a carriage return: [!^013]
repeated any number of times: @
followed by [:.] (specifying a colon or a period) in parentheses to form a group.

And we can use the following in the “Replace With” box:

12

Here’s what that means:

Replace everything that was found
with the text represented by group 1: 1
followed by the text represented by group 2: 2

But does that actually work? Well, sort of, Here’s what we get:

Lyon J, Adin R, Carter TO, :
Lyon J, Adin R, Carter TO, .

Maybe that’s close enough, as it would now be an easy matter to search for comma space colon and replace it with a colon, and to search for comma space period and replace it with a period. But if we want to refine our search string even further, we could use this:

([!^013]@, [!^013]@, [!^013]@), [!^013]@([:.])

Here, we’ve placed the comma and space following the third name outside the parenthetical group, so they’re not included when the group is replaced by /1. That actually solves the problem, if you want to get precise, giving us a result like this:

Lyon J, Adin R, Carter TO:
Lyon J, Adin R, Carter TO.

Variation 2

Rich wrote:

How can I revise the string to work even if there is no consistency in punctuation of names? For example, suppose the names are: “Lyon, J, Adin R, Carter T.O., Jackson TT, Doe, J.; Smith K; Winger, W; Hoffnagle TTP.”

As given, this can’t be done. Why? Because we’ve lost the uniqueness of the comma “handles” that separate the names. For example, instead of this —

Lyon J,

— we have this:

Lyon, J,

And instead of this —

Smith K,

— we have this:

Smith K;

So again, as given, we can’t fulfill Rich’s request. But can we change the “as given”? Why, yes, we can!

We can search for a lowercase letter followed by a comma (at the end of a last name) and replace it with just the lowercase letter (and no comma):

Find what: ([a-z]),
Replace with: 1

We can search for a semicolon (which sometimes follows initials) and replace it with a comma:

Find what: ;
Replace with: ,

Then we can use the same wildcard string we used earlier to fulfill Rich’s request:

Find what: ([!^013]@, [!^013]@, [!^013]@), [!^013]@([:.])
Replace with: 12

You may be wondering if these wildcard strings will affect the article titles and journal names and not just the author names. The answer is, it depends. I’m assuming, for example, that the article titles and journal names don’t include commas (just for purposes of illustration). But if they do, you may have to get creative. Let’s take this as an example:

Levy, D, Ehret G, Rice K, Verwoert G, Launer L, Dehghan A, Glazer N, Morrison A, Johnson A, Aspelund T, Ganesh S, Chasman D: Genome-wide association study of blood pressure, stress, and hypertension. Nature 2009, 41(6): 677-687.

See that comma after “Levy”? Above, we got rid of it with the following strings:

Find what: ([a-z]),
Replace with: 1

But notice that this will also remove the commas after “pressure” and “stress” in the article title, which we don’t want to do. The solution, again, comes down to handles. What do we have that sets off the article title and journal name? In this example, they’re preceded by the colon after the author names (“Chasman D:”) and followed by a carriage return (at the end of the citation). So here’s a rather sneaky solution: Search for a colon followed by anything that isn’t a carriage return until you come to a carriage return. Then replace whatever was found with itself (^&) formatted as Hidden:

Find what: :[!^013]@^013
Replace with (use Hidden formatting): ^&

If you don’t know how to replace using formatting, here’s the secret:

1. Put your cursor in the “Replace with” box.
2. Click the “More” button if it’s showing.
3. Click the “Format” button on the bottom left.
4. Click “Font.”
5. Put a check in box labeled “Hidden.”
6. Click the “OK” button.

Notice that you can replace with all kinds of formatting: styles, paragraph alignment, and so on. You can also use formatting in the “Find what” box! This is really powerful stuff, and if you didn’t know about it before, now you can add it to your bag of tricks.

At any rate, with the article titles and journal names formatted as Hidden, you can make sure they actually are hidden by clicking the “Show/Hide” button (with the pilcrow icon: ¶) on Word’s “Home” tab. Then run your find and replace to remove commas from last names:

Find what: ([a-z]),
Replace with: 1

Finally, unhide the article titles and journal names (after using “Show/Hide” to display them):

Find what: (Hidden formatting)
Replace with: (Not Hidden formatting)

At that point, the commas will be gone from the authors’ last names but preserved in the article titles and journal names.

By the way, if you’re working on a Macintosh, you’ll find that Word doesn’t recognize the standard code for a carriage return (^013) while searching with wildcards. But never fear: you can still do what you need by “escaping” the code with a backslash and treating it as a range using square brackets. In other words, use this:

[ˆ013]

To specify not a carriage return, use the following:

[!ˆ013]

Variation 3

Rich wrote:

How can I adapt the wildcard string to delete those in excess of a certain number? For example, I have one client who wants up to ten author names listed and “et al” used only for names eleven and following. I would like to specify how many names I want retained and replace the excess with “et al.” For example, if there are fifteen names, delete the last five if ten are okay and replace them with “et al.”

Theoretically, we could do that as long as there’s a “handle” that marks the end of the names. Let’s take this example:

Levy D, Ehret G, Rice K, Verwoert G, Launer L, Dehghan A, Glazer N, Morrison A, Johnson A, Aspelund T, Ganesh S, Chasman D: Genome-wide association study of blood pressure and hypertension. Nature 2009, 41(6): 677-687.

There are actually twelve names there, so we want to keep the first ten and replace the last two with “et al.” What’s our handle? The colon after the last name (“Chasman D:”) and before the article’s title. So let’s try an expansion of the wildcard search string we used in the previous post for Lyonizing Word. Instead of grouping three comma-separated names, we’ll group ten:

Find what: ([!^013]@, [!^013]@, [!^013]@, [!^013]@, [!^013]@, [!^013]@, [!^013]@, [!^013]@, [!^013]@, [!^013]@,)[!^013]@(:)
Replace with: 1 et al.2

That would work if Word could handle it. But if you try it, Word will complain: “The Find What text contains a Pattern Match expression which is too complex.” So now what? Honestly, I’m not sure. I tried several other possibilities, none of which were successful. So if you, Gentle Reader, have any ideas about how to accomplish this seemingly impossible feat, I’d love to hear them.

Wildcard searching can’t do everything, but it can do an awful lot. As I’ve said before, after all these years of editing, wildcard searching is the tool I rely on the most. I encourage you to invest the time needed to learn to use this tool, which will repay your efforts many times over. A good place to start is my free paper “Advanced Find and Replace in Microsoft Word.”

I hope you’ll also watch for my forthcoming Wildcard Cookbook for Microsoft Word. I’m still trying to find more real-life examples for the book, so if you have some particularly sticky problems that might be solved using a wildcard search, I hope you’ll send them my way. Maybe I can save you some work and at the same time figure out solutions that will help others in the future. Thanks for your help!

Jack Lyon (editor@editorium.com) owns and operates the Editorium, which provides macros and information to help editors and publishers do mundane tasks quickly and efficiently. He is the author of Microsoft Word for Publishing Professionals and of Macro Cookbook for Microsoft Word. Both books will help you learn more about macros and how to use them.

For other Lyonizing Word essays at An American Editor, Lyonizing Word at AAE.

The Business of Editing: Wildcarding for Dollars

Freelancers often lack mastery of tools that are available to us. This is especially true of wildcarding. This lack of mastery results in our either not using the tools at all or using them to less than their full potential. These are tools that could save us time, increase accuracy, and, most importantly, make us money. Although we have discussed wildcard macros before (see, e.g., The Only Thing We Have to Fear: Wildcard Macros, The Business of Editing: Wildcard Macros and Money, and Macro Power: Wildcard Find & Replace; also see the various Lyonizing Word articles), after recent conversations with colleagues, I think it is time to revisit wildcarding.

Although wildcards can be used for many things, the best examples of their power, I think, are references. And that is what we will use here. But remember this: I am showing you one example out of a universe of examples. Just because you do not face the particular problem used here to illustrate wildcarding does not mean wildcarding is not usable by you. If you edit, you can use wildcarding.

Identifying the What Needs to Be Wildcarded

We begin by identifying what needs a wildcard solution. The image below shows the first 3 references in a received references file. This was a short references file (relatively speaking; I commonly receive references files with 500 to 1,000 references), only 104 entries, but all done in this fashion.

references as received

references as received

The problems are marked (in this essay, numbers in parens correspond to numbers in the images): (1) refers to the author names and the inclusion of punctuation; (2) shows the nonitalic journal name followed by punctuation; and (3) shows the use of and in the author names. The following image shows what my client wants the references to look like.

references after wildcarding

references after wildcarding

Compare the numbered items in the two images: (1) the excess punctuation is gone; (2) the journal title is italicized and punctuation free; and (3) the and is gone.

It is true that I could have fixed each reference manually, one-by-one, and taken a lot of time to do so. Even if I were being paid by the hour (which I'm not; I prefer per-page or project fees), would I want to make these corrections manually? I wouldn't. Not only is it tedious, mind-numbing work, but it doesn't meet my definition of what constitutes editing. Yes, it is part of the editing job, but I like to think that removing punctuation doesn't reflect my skills as a wordsmith and isn't the skill for which I was hired.

I will admit that in the past, in the normal course, if the reference list were only 20 items long, I would have done the job manually. But that was before EditTools and its Wildcard macro, which enables me to write the wildcard string once and then save it so I can reuse it without rewriting it in the future. In other words, I can invest time and effort now and get a reoccurring return on that investment for years to come. A no-brainer investment in the business world.

The Wildcard Find

CAUTION: Wildcard macros are very powerful. Consequently, it is recommended that you have a backup copy of your document that reflects the state of the document before running wildcard macros as a just-in-case option. If using wildcard macros on a portion of a document that can be temporarily moved to its own document, it is recommended that you move the material. Whenever using any macro, use caution.

Clicking Wildcard in EditTools brings up the dialog shown below, which gives you options. If you manually create Find and Replace strings, you can save them to a wildcard dataset (1) for future recall and reuse. If you already have strings that might work, you can retrieve them (2) from an existing wildcard dataset. And if you have taken the next step with Wildcards in EditTools and created a script, you can retrieve the script (3) and run it. (A script is simply a master macro that includes more than 1 string. Instead of retrieving and running each string individually, you retrieve a script that contains multiple strings and run the script. The script will go through each string it contains automatically in the order you have entered the strings.)

Wildcard Interface

Wildcard Interface

As an example, if I click Retrieve from WFR dataset (#2 above), the dialog shown below opens. In this instance, I have already created several strings (1) and I can choose which string I want to run from the dropdown. Although you can't see it, this particular dataset has 40 strings from which I can choose. After choosing the string I want to run, it appears in the Criteria screens (2 and 3), divided into the Find portion of the string and the Replace portion. I can then either Select (4) the strings to be placed in primary dialog box (see Wildcard Interface above) or I can Edit (5) the strings if they need a bit of tweaking.

Wildcard Dataset Dialog

Wildcard Dataset Dialog

If I click Select (4 above), the strings appear in the primary Wildcard dialog as shown below (1 and 2). Because it can be hard to visualize what the strings really look like when each part is separated, you can see the strings as they will appear to Microsoft Word (3). In addition, you know which string you chose because it is identified above the criteria fields (purple arrow). Now you have choices to make. You can choose to run a Test to be sure the criteria work as expected (4), or if you know the criteria work, as would be true here, you can choose to Find and Replace one at a time or Replace All (5).

The Effect of Clicking Select

The Effect of Clicking Select

I know that many readers are saying to themselves, "All well and good but I don't know how to write the strings, so the capability of saving and retrieving the strings isn't of much use to me." Even if you have never written a wildcard string before, you can do so quickly and easily with EditTools.

Creating Our String

Let's begin with the first reference shown in the References as Received image above. We need to tackle this item by item. Here is what the author names look like as received:

Kondo, M., Wagers, A. J., Manz, M. G., Prohaska, S. S., Scherer, D. C., Beilhack, G. F. et al.:

What we have for the first name in the list is

[MIXED case multiletter surname][comma][space][single UPPERCASE letter][period][comma]

which makes up a unit. That is, a unit is the group of items that need to be addressed as a single entity. In this example, each complete author name will constitute a unit.

This first unit has 6 parts to it (1 part = 1 bracketed item) and we have identified what each part is (e.g., [MIXED case multiletter surname]). To find that first part we go to the Wildcard dialog, shown below, click the * (1) next to the blank field in line 1. Clicking the * brings up the Select Wildcard menu (2) from which we choose we choose Character Menu (3). In the Character Menu we choose Mixed Case (4) because that is the first part of the unit that we need to find.

Wildcard First Steps

Wildcard First Steps

When we choose Mixed Case (4 above), the Quantity dialog below appears. Here you tell the macro whether there is a limit to the number of characters that fit the description for this part. Because we are dealing with names, just leave the default of no limit. However, if we knew we only wanted names that were, for example, 5 letters or fewer in length, we would decheck No Limit and change the number in the Maximum field to 5.

How many letters?

How many letters?

Clicking OK in the Quantity results in entry of the first portion of our string in the Wildcard dialog (1, below). This tells the macro to find any grouping of letters — ABCd, Abcde, bCdaefTg, Ab, etc. — of any length, from 1 letter to 100 or more letters. Thus we have the criteria for the first part of our Find unit even though we did not know how to write wildcardese. In the dialog, you can see how the portion of the string really looks to Microsoft Word (2) and how, if you were to manually write this part using Microsoft Word's Find & Replace, it would need to be written.

How this part looks in wildcardese

How this part looks in wildcardese

The next step is to address the next part, which can be either [comma] alone or [comma][space]. What we need to be careful about is that we remember that we will need the [space] in the Replace string. If we do [comma][space] and if we do not have just a [space] entry, we will need to provide it. For this example, I will combine them.

Because these are simple things, I enter the [comma][space] directly in the dialog as shown below. With my cursor in the second blank field (1), I simply type a comma and hit the spacebar. You can verify this by looking below in the Find line of wildcardese (2), where you can see (, ):

Manually adding the next part

Manually adding the next part

The remaining parts to do are [single UPPERCASE letter][period][comma]. They would be done using the same techniques as the prior parts. Again, we would have to decide whether the [period] and [comma] need to go on separate lines or together on a single line. Why? Because we want to eliminate the [period] but keep the [comma]. If they are done together as we did [comma][space], we will manually enter the [comma] in the Replace.

For the [single UPPERCASE letter], we would follow the steps in Wildcard First Steps above except that instead of Mixed Case, we would select UPPER CASE, as shown here:

Selecting UPPER CASE from the Characters Menu

Selecting UPPER CASE from the Characters Menu

This brings up the Quantity dialog where we decheck No Limit and, because we know it is a single letter we want found, use the default Minimum 1 and Maximum 1, as shown here:

A Quantity of 1

A Quantity of 1

Clicking OK takes us to the main Wildcard dialog where the criteria to find the [single UPPERCASE letter] has been entered (1, below). Looking at the image below, you can see it in the string (2). For convenience, the image also shows that I manually entered the [period][comma] on line 4 (3 and 4).

The rest of the Find criteria

The rest of the Find criteria

The Wildcard Replace

The next step is to create the Replace part of the string. Once again, we need to analyze our Find criteria.

We have divided the Find criteria into these 4 parts, which together make up the Find portion of the string:

  1. [MIXED case multiletter surname]
  2. [comma][space]
  3. [single UPPERCASE letter]
  4. [period][comma]

The numbers represent the numbers of the fields that are found in the primary dialog shown above (The Rest of the Find Criteria). What we need to do is determine which fields we want to replace and in what order. In this example, what we want to do is remove unneeded punctuation, so the Replace order is the same as the Find order. We want to end up with this:

  1. [MIXED case multiletter surname]
  2. [space]
  3. [single UPPERCASE letter]
  4. [comma]

The way we do so is by filling in the Replace fields. The [space] and the [comma] we can enter manually. You can either enter every item manually or you can let the macro give you a hand. Next to each field in the Replace column is an *. Clicking on the * brings up the Select Wildcard dialog:

Select Wildcard

Select Wildcard

Because what we need is available in the Find Criteria, we click on Find Criteria. However, the Select Wildcard dialog also gives us options to insert other items that aren't so easy to write in wildcardese, such as a symbol. When we click Find Criteria, the Use Find Criteria dialog, shown below, appears. It lists everything that is found in the Find criteria by line.

Use Find Criteria dialog

Use Find Criteria dialog

Double-clicking the first entry (yellow highlighted) places it in the first line of the Replace, but by a shortcut — 1 — as shown in the image below (1). If we wanted to reverse the order (i.e., instead of ending up with Kondo M, we want to end up with M Kondo,), we would select the line 3 entry in the Use Find Criteria Dialog above, and double-click it. Then 3 would appear in the first line of Replace instead of 1.

The completed wildcard macro

The completed wildcard macro

For convenience, I have filled the Replace criteria (1-4) as The Completed Wildcard Macro image above shows. The [space] (2) and the [comma] (4) I entered manually using the keyboard. The completed Replace portion of the string can be seen at (5).

The next decision to be made is how to run the string — TEST (6) or manual Find/Replace (7) or auto Replace All (8). If you have not previously tried the string or have any doubts, use the TEST (6). It lets you test and undo; just follow the instructions that appear. Otherwise, I recommend doing a manual Find and Replace (7) at least one time so you can be certain the string works as you intend. If it does work as intended, click Replace All (8).

You will be asked whether you want to save your criteria; you can preempt being asked by clicking Add to WFR dataset (9). You can either save to an existing dataset or create a new dataset. And if you look at the Wildcard Dataset dialog above (near the beginning of this essay), you will see that you can not only name the string you are saving, but you can provide both a short and a detailed description to act as reminders the next time you are looking for a string to accomplish a task.

Spend a Little Time Now, Save Lots of Time Later

Running the string we created using Replace All on the file we started with, will result in every instance of text that meets the Find criteria being replaced. I grant that the time you spend to create the string and test it will take longer than the second and subsequent times you retrieve the string and run it, but that is the idea: spend a little time now to save lots of time later.

I can tell you from the project I am working on now that wildcarding has saved me more than 30 hours of toiling so far. I have already had several chapters with 400 or more references that were similar to the example above (and a couple that were even worse). Wildcarding let me clean up author names, as here, and let me change cites from 1988;52(11):343-45 to 52:343, 1988 in minutes.

As you can see from this exercise, wildcarding need not be difficult. Whether you are an experienced wildcarder or new to wildcarding, you can harness the power of wildcarding using EditTools' WildCard Find & Replace. Let EditTools' WildCard Find & Replace macro system help you. Combine wildcarding with EditTools' Journals macro and references become quicker and easier.

Richard Adin, An American Editor

Related An American Editor essays are:

_________________________

Looking for a Deal?

You can buy EditTools in a package with PerfectIt and Editor's Toolkit at a special savings of $78 off the price if bought individually. To purchase the package at the special deal price, click Editor's Toolkit Ultimate.