Jack Lyon

Two-Step Searching

While editing in Microsoft Word, I often need to find something that's *partially* formatted and replace it with something else. For example, let's say a manuscript has a bunch of superscript note numbers preceded by a space that's *not* in superscript. Here's an example (with carets indicating superscript):

Lorem ipsum dolor sit amet. ^1^

I'd like to have Word find all such spaces and replace them with nothing (in other words, delete them), but that doesn't seem possible. I can open Word's Replace dialog (Edit > Replace) and set the "Find What" box to superscript, but the space isn't superscript, and the manuscript has thousands of spaces that *don't* precede a superscript number. It also has numbers that aren't superscript (like 2001), so I can't just find spaces preceding numbers. What's an editor to do?

Find and replace the spaces in two steps rather than one:

1. Mark the superscript with codes.

2. Delete the spaces and codes.

STEP 1

To mark the superscript with codes, do this:

1. Open Word's Replace dialog by clicking the "Edit" menu and then "Replace."

2. Put your cursor in the "Find What" box and make sure the box is empty.

3. Click the "Format" button. (You may need to click the "More" button first.)

4. Click "Font."

5. Put a checkmark in the "Superscript" box.

6. Click the "OK" button. The "Find What" box should now be set to superscript.

7. Put your cursor in the "Replace With" box.

8. Type the following string in the "Replace With" box:

^&

9. Click "Replace All."

All of your superscript numbers will be replaced with themselves, preceded by , which is a code I just made up to indicate superscript. In other words, your sentences will now look like this:

Lorem ipsum dolor sit amet. ^1^

Feel free to make up your own codes for whatever you need (italic, bold, paragraph styles, and so on).

The other code in the "Replace With" box, ^&, is Microsoft Word's "Find What Text" code, which represents the text that was found (the superscript numbers). You can learn about it here:

http://www.topica.com/lists/editorium/read/message.html?mid=1703525514

STEP 2

To delete the spaces and codes, do this:

1. Open Word's Replace dialog by clicking the "Edit" menu and then "Replace."

2. Put your cursor in the "Find What" box by clicking it.

3. Type the following string in the "Find What" box:

(You can't see it very well in this newsletter, but there's a space in front of that code, and it needs to be there.)

4. Click the "No Formatting" button so you're no longer finding superscript, which is now represented by the code.

5. Put your cursor in the "Replace With" box and make sure the box is empty.

6. Click "Replace All."

All of the spaces in front of the codes (and thus in front of the superscript numbers) will be deleted, as will the codes themselves, leaving your sentences looking like this:

Lorem ipsum dolor sit amet.^1^

You can use this little two-step trick any time you need to find and replace partially formatted text. Now that you know how, that will probably be quite often.

ERRATA

In the April 18, 2001, issue of Editorium Update, I gave the following find-and-replace pattern for putting last name first in a list of names:

Pattern: G. B. Harrison, Ph.D.

Find What: ^013([A-z].) ([A-Z].) ([A-z]@,) (*)^013

Replace With: ^p3 1 2, 4^p

That first [A-z] wildcard range should have been given as [A-Z] (with a capital Z) to indicate a capital letter. [A-z] (with a lowercase z) will work, but it doesn't make the example as clear as it should have been.

_________________________________________

READERS WRITE

After reading last week's newsletter, subscriber Nancy Adess (naedit@earthlink.net) wrote, "Why would there be periods at all at the end of references in parens in the text? Why not just (Thoreau, Walden, p. 10)?"

I responded:

I realize I'm at odds with the Chicago Manual of Style (10.77) on this, but I think Chicago is wrong. Chicago style is like this, with no period at the end of the quotation and a period after the source citation:

"The improvements of ages have had but little influence on the essential laws of man's existence" (Thoreau, Walden [New York: Time Reading Program, 1962], p. 10).

To me, the period is *part* of the quotation--but we've just put it after the citation. However, if the sentence ends with a question or exclamation mark, Chicago keeps it with the quotation where it belongs:

"What is the nature of the luxury which enervates and destroys nations?" (Thoreau, Walden [New York: Time Reading Program, 1962], p. 13).

The placement of the question mark reveals the faulty reasoning behind moving the period--we didn't move the question mark, right? Also, we now have another problem: Since we're not going to move the question mark, how do we punctuate our citation? Chicago does it by leaving that period there--but in this case the period was never part of the sentence to begin with. This makes no sense at all--and besides, the period looks stupid hanging out there by itself. I think the sentence and the citation should be punctuated independently, like this:

"The improvements of ages have had but little influence on the essential laws of man's existence." (Thoreau, Walden [New York: Time Reading Program, 1962], p. 10.)

"What is the nature of the luxury which enervates and destroys nations?" (Thoreau, Walden [New York: Time Reading Program, 1962], p. 13.)

Simple. Sensible. Neat. Consistent. And not ugly.

And besides, I was trained by a marvelous, independent-thinking editor, and that's the way she did it. 🙂

In addition, using this style makes electronic manipulation simple because the sentence and the citation are both self-contained. For example, it's now an easy matter to write a macro that will turn parenthetical source citations into footnotes--or vice versa. If we take our first sentence, punctuated like this--

"The improvements of ages have had but little influence on the essential laws of man's existence." (Thoreau, Walden [New York: Time Reading Program, 1962], p. 10.)

--we can use a macro to:

1. Delete the space before the citation.

2. Delete the opening parenthesis.

3. Cut to the closing parenthesis.

4. Delete the closing parenthesis.

5. Create a footnote.

6. Paste the cut citation into the footnote.

7. Close the footnote.

That leaves our sentence looking like this (with carets indicating superscript):

"The improvements of ages have had but little influence on the essential laws of man's existence."^1^

And our note looking like this:

^1^Thoreau, Walden [New York: Time Reading Program, 1962], p. 10.

We could also use the macro successfully on our second sentence (the one with the question mark). But if we had followed Chicago style, we'd have to create separate macros for each kind of sentence and citation, and they'd be more complicated, too. (Our NoteStripper program includes macros that do this kind of stuff.)

Wildcards in the Real World

I hope you've enjoyed the wildcard "tutorial" articles in Editorium Update over the past few weeks. If you haven't read them, I'd recommend that you do so in order to understand this week's article. You can review the whole series here, starting with the March 20, 2001, issue:

http://www.editorium.com/euindex.htm/

This week I thought you might be interested in seeing some of the wildcard combinations I've used recently in an actual editing project. Maybe you'll find them useful too.

EXAMPLE 1

The manuscript I've been working on has lots of parenthetical references like this:

(Thoreau, Walden, p 10.)

You'll notice that there's no period after the p. To fix these references, I used the following string in Microsoft Word's "Find What" box in the Replace dialog (Edit > Replace), with "Use Wild Cards" (or "Use Pattern Matching") turned on:

p ([0-9]@.))

That's an odd-looking thing with its double parentheses, but its meaning becomes clear when you consider that the first closing parenthesis represents the closing parenthesis of the reference. The backslash in front of it tells Word to treat it as a character rather than the end of a group "expression." So the whole string says this:

1. Find a p followed by a space.

2. Find, as a group, one or more digits followed by a period followed by a closing parenthesis.

I put this in the "Replace With" box:

p. 1

And that string says this:

1. Replace the p followed by a space with p followed by a period and a space.

2. Replace the rest of the "Find What" string (the group in parentheses) with itself.

When I was finished finding and replacing, the references looked like this:

(Thoreau, Walden, p. 10.)

EXAMPLE 2

Here's another example from the manuscript I've been working on:

(Genesis 8:26)

You'll notice that there's no period before the closing parenthesis. Wanting to fix these, I put this string in the "Find What" box:

([0-9]@:[0-9]@))

It says:

1. Find, as a group, any number of digits followed by a colon followed by any number of digits.

2. Find a closing parenthesis character.

I put this in the "Replace With" box:

1.)

And that string says:

1. Replace the group with itself.

2. Replace the closing parenthesis with a period and a closing parenthesis.

When I was finished finding and replacing, the references looked like this:

(Genesis 8:26.)

"Why," you may be wondering, "did you have to use wildcards? Why didn't you just find a closing parenthesis and replace it with a closing parenthesis and a period, like this:

Find What:

)

Replace With:

.)

I couldn't do that because the manuscript had other parenthetical items (like this one) that didn't need a period. Using wildcards makes it possible to find exactly the items you want and ignore those you don't.

EXAMPLE 3

The manuscript had Bible references that looked like this:

II Corinthians

II John

II Kings

I wanted them to look like this:

2 Corinthians

2 John

2 Kings

I put this in the "Find What" box:

II ([A-Z])

The string says:

1. Find I followed by I followed by a space.

2. Find any capital letter.

And I put this in the "Replace With" box:

2 1

That string says:

1. Replace the II with a 2.

2. Replace the capital letter with itself.

Worked like a charm.

"Why," you ask, "didn't you just replace II with 2 throughout the manuscript rather than use wildcards?" Well, I could have. But I was also thinking about other entries like these:

I Corinthians

I John

I Kings

Obviously, I couldn't just replace I with 1 throughout the manuscript, so I used this string in the "Find What" box:

I ([A-Z])

And I used this string in the "Replace With" box:

1 1

And that took care of the problem.

I hope you're beginning to see how powerful wildcards can be and how much time they can save while you're editing a manuscript. Using wildcards, you can quickly fix repetitive problems that would take hours to correct by hand. I highly encourage you to try them, but I also urge you to back up your documents and experiment on some junk text before using wildcards in the "real world." Also, try finding and replacing items individually before replacing all of them globally. Then you'll know that the wildcards you're using actually do what you need to have done.

Using the "Find What Expression" Wildcard

For the past few weeks we've been talking about using wildcards to find and replace text in Microsoft Word. Last week I introduced the "Find What Expression" wildcard (n) and promised to show you how to use it to move things around.

Let's say you've got a list of authors, like this:

Emily Dickinson

Ezra Pound

Willa Cather

Ernest Hemingway

and you need to put last names first, like this:

Dickinson, Emily

Pound, Ezra

Cather, Willa

Hemingway, Ernest

You can use the "Find What Expression" wildcard to do this in a snap.

Start the Replace dialog (Edit > Replace) and put a check in the "Use wildcards" or "Use Pattern Matching" box (you may need to click the "More" button before this is available). Then, in the "Find What" box, enter this:

^013([A-z]@) ([A-z]@)^013

If you've been reading Editorium Update, you'll probably understand these codes and wildcards:

^013 represents a paragraph mark.

[A-z] represents any single alphabetic character, from uppercase A to lowercase z.

@ represents any additional occurrences of the previous character--in this case, any single alphabetic character, from uppercase A to lowercase z.

() groups [A-z]@ together as an "expression" representing an author's first name. (This grouping is the key to using the "Find What Expression" wildcard in the "Replace With" box.)

The space after the first ([A-z]@) expression represents the space between first name and last name.

The next ([A-z]@) group represents the author's last name.

The final ^013 represents the paragraph mark after the name.

Now, in the "Replace With" box, enter this:

^p2, 1^p

The ^p codes represent paragraph marks. "Wait a minute," you say. "You just used ^013 for a paragraph mark. Why the change?"

Excellent question. The answer has two parts:

1. If we could use ^p in the "Find What" box, we would. But since Word won't let us do that when using wildcards (it displays an error message), we have to resort to the ANSI code, ^013, instead. You can learn more about this here:

http://www.topica.com/lists/editorium/read/message.html?mid=1703875043

2. If we use ^p in the "Replace With" box, Word retains the formatting stored in the paragraph mark (a good thing). If we use ^013, Word loses the formatting for the paragraph (a bad thing). In a list of author names, this probably doesn't matter, but you'll need to know this when finding and replacing with codes in more complicated settings.

Continuing with our example, ^p2, 1^p:

2 is the "Find What Expression" wildcard for our *second* expression (hence the 2) in the "Find What" box--in other words, it represents the last name of an author in our list.

The comma follows this wildcard because we want a comma to follow the author's last name.

A space follows the comma because we don't want the last and first names mashed together, like this: "Pound,Ezra."

1 is the "Find What Expression" wildcard for our *first* expression (hence the 1) in the "Find What" box--in other words, it represents the first name of an author in our list.

Now click the "Replace All" button. The authors' names will be transposed:

Dickinson, Emily

Pound, Ezra

Cather, Willa

Hemingway, Ernest

You've always wondered how to do that, right? But now you're wondering about middle initials. And middle names. And Ph.D.s.

All of those make things more complicated. But here, in a nutshell, are the Find and Replace strings you'll need for some common name patterns (first last, first middle last, first initial last, and so on). First comes the name pattern, then the Find string, and finally the Replace string, like this:

NAME PATTERN

FIND WHAT

REPLACE WITH

William Shakespeare

^013([A-z]@) ([A-z]@)^013

^p2, 1^p

Alfred North Whitehead

^013([A-z]@) ([A-z]@) ([A-z]@)^013

^p3, 1 2^p

Philip K. Dick

^013([A-z]@) ([A-Z].) ([A-z]@)^013

^p3, 1 2^p

L. Frank Baum

^013([A-Z].) ([A-z]@) ([A-z]@)^013

^p3, 1 2^p

G. B. Harrison, Ph.D.

^013([A-z].) ([A-Z].) ([A-z]@,) (*)^013

^p3 1 2, 4^p

J.R.R. Tolkien

^013([A-Z].)([A-Z].)([A-Z].) ([A-z]@)^013

^p4, 123^p

That list doesn't show every pattern you'll encounter, but it should provide enough examples so you'll understand how to create new patterns on your own--which is the whole point of this article. Once you've created all of the patterns you need, you could record all of that finding and replacing in a single macro that you could run whenever you need to transpose names in a list.

_________________________________________

READERS WRITE

After reading last week's newsletter, Mary L. Tod (mtod@earthlink.net) wrote:

In your Editorium Update for today, is it necessary to enclose the space in parentheses? Since it isn't being replaced by itself, can't the expression in the Find box be reduced to

(^013[0-9]@.)

(with just the space entered after the first expression)?

Mary is absolutely right about this. I put the space in parentheses because I wanted to briefly introduce the idea that you could have more than one "Find What Expression" wildcard--in this case, 2. For that to work, the space has to be in parentheses so it's recognized as an expression. But I didn't actually *use* the 2 in the example, so a simple space would have worked just fine.

Mary continued:

In a related question, does the @ symbol in the wildcard field also allow for no repeats of the previous character? Otherwise, it would start the list at 10, wouldn't it?

2. followed by a number ([0-9])

3. followed by one or more numbers (@)

Again, this is right on the mark. The @ really means "followed by one or more numbers *if there are any.*" A more technical way to put it is "followed by *zero* or more numbers."

Thanks to Mary for her astute comments.

Wildcard Grouping

For the past few weeks we've been talking about using wildcards to find and replace text in Microsoft Word. This week we'll discuss wildcard grouping, which is simply a way of telling Word that you want certain wildcards to be used together as a unit.

Continuing with our example from last week, let's say that you're editing a document with lots of numbered lists, like this:

1. Lorem ipsum dolor sit amet.

2. Ut wisi enim ad minim veniam.

3. Duis autem vel eum iriure dolor.

Now let's say that you want to replace the space after each number and period with a tab. After calling up the Replace dialog (Edit > Replace) and putting a check in the "Use wildcards" or "Use Pattern Matching" box, you could enter the following string of characters into the "Find What" box:

^013[0-9]@.

(You can't see it, but there's a space on the end of that string, and it needs to be included.) As you probably recall from the past few weeks, this tells Microsoft Word to do the following:

1. Find a paragraph mark (^013)

2. followed by a number ([0-9])

3. followed by one or more numbers (@)

4. followed by a period (.)

5. followed by a space ( ).

But that still won't let us replace that space with a tab. Why? Because there's no way to replace the space independently of the rest of the string--whatever the string finds *includes* the space.

So let's try this:

(^013[0-9]@.)( )

Notice that we've grouped the wildcards and other characters together with parentheses. (In case you can't tell, that's our uncooperative space between the last two parentheses.) Such groups, for reasons known only to the mathematically minded, are called "expressions," and in this case there are two of them:

1. (^013[0-9]@.)

2. ( )

Grouping things together like this makes it possible to refer to each group independently in the "Replace With" box--a wonderful thing! So in the "Replace With" box, we'll enter this string:

1^t

That "1" is an example of the little-known "Find What Expression" wildcard, which lives deep in the wilds of Redmond, Washington, and only comes out at night. It's a backslash followed by the number one, and it tells Word to replace whatever is found by the first expression--

(^013[0-9]@.)

--with whatever the first expression finds. (Yes, you read that correctly.) In other words, Word replaces whatever the first expression finds with *itself.* That seems strange, but it means we can treat the second expression--

( )

--as an independent unit, which is exactly what we need to do. (By the way, "Find What Expression" wildcards are the only wildcards that can be used in the "Replace With" box. They are simply a backslash followed by a number.)

The ^t, of course, is the code for a tab, as explained in the November 14, 2000, issue of Editorium Update:

http://www.topica.com/lists/editorium/read/message.html?mid=1703968584

You'll notice that we haven't included a "2" code, which would replace something with whatever is found by our *second* expression, the space in the parentheses. Since we haven't included that code, the space will be replaced by nothing--in other words, it will be *deleted* during the Find and Replace. So the relationship between the wildcards in the "Find What" string and the "Replace With" string is something like this:

FIND WHAT: REPLACE WITH:

(^013[0-9]@.) > 1 (followed by a tab: ^t)

( ) > [nothing]

Now let's try using them:

1. Start the Replace dialog (Edit > Replace).

2. Put a check in the "Use wildcards" or "Use Pattern Matching" box (you may need to click the "More" button before this is available).

3. In the "Find What" box, enter this:

(^013[0-9]@.)( )

4. In the "Replace With" box, enter this:

1^t

5. Click the "Replace All" button.

Presto! All of the spaces after your numbers will be replaced with tabs, and your list will now look like this:

1.Lorem ipsum dolor sit amet.

2.Ut wisi enim ad minim veniam.

3.Duis autem vel eum iriure dolor.

To me, this is like magic, and it comes in handy more often than you might think. I hope you'll find it useful! In the future, I'll try to provide other examples that you can apply in your day-to-day work. Next week I'll show you how to use "Find What Expression" codes to move things around.

_________________________________________

READERS WRITE

After reading our past few newsletters on wildcard searching, a subscriber wrote, "Use Pattern Matching does not appear to be an option in my Word program."

I apologize for not explaining this. In Microsoft Word 6 and 95, "Use Pattern Matching" is an option in the Find and Replace dialogs, and selecting this option tells Word that you're going to use wildcards. In Word 97 and later, this option is simply called "Use Wildcards." To see this option, you may need to click the "More" button in the Find and Replace dialogs.

Wildcard Ranges

Last week we discussed using wildcard combinations to find text in a Microsoft Word document. You can read last week's newsletter here:

http://www.topica.com/lists/editorium/read/message.html?mid=1706069286

This week we'll talk about wildcard ranges, which you'll probably use a lot.

Wildcard ranges are fairly simple. You just use the [-] wildcard to tell Microsoft Word what to find. Let's continue with our example from last week:

b?[td]

As you probably recall, this tells Word to find the letter b followed by any single character followed by either t or d. In other words, it will find "bet," "but," "bit," "bat," "bed," "bud," "bid," "bad," and so on.

But what if we wanted to find "bat," "bad," "bet," and "bed" but NOT "bit," "bid," "bud," and "but"? After bringing up the Find dialog (Edit > Find) and turning on "Use Pattern Matching" (you may need to click the "More" button before this is available), we could use this wildcard combination in the "Find What" box:

b[a-e][td]

This tells Word to find the letter b followed by any letter from a to e (in other words, a, b, c, d, or e) followed by t or d. (The range *must* be in ascending order--in other words, from a "lower" letter [such as a] to a "higher" letter [such as z].)

Here's another way to approach this:

b[!f-z][td]

Notice the exclamation mark at the front of the "range" wildcard. The exclamation mark tells Word to find every character *except* those specified--in this case, the letters f through z. This wildcard combination, too, will find "bat," "bad," "bet," and "bed" but not "bit," "bid," "bud," and "but."

Here's a range that I use all the time:

[0-9]

This little beauty finds any occurrence of a digit. What's that good for? Let's say you're editing a document with lots of numbered lists, like this:

1. Lorem ipsum dolor sit amet.

2 Ut wisi enim ad minim veniam.

3. Duis autem vel eum iriure dolor.

Did you notice that the number 2 has no period? Good! You must have "the eye." But if you have several long lists, you might want to let Word find these problem numbers for you. To do so, try this wildcard string:

^013[0-9]@[!.]

Pretty cryptic. But if you've been reading Editorium Update, you can probably figure this out:

^013 is the numeric code for a carriage return.

[0-9] represents any digit.

@ tells Word to find one or more occurrences of the previous expression (in this case, any digit). This is necessary in case you have lists with two-digit (or longer) numbers.

[!.] tells Word to find any character *except* a period.

Piece of cake.

Here are two other wildcard ranges you might find useful:

[a-z] represents any occurrence of a lowercase letter.

[A-Z] represents any occurrence of an uppercase letter.

Remember, too, that you can use the [] wildcard (without a hyphen) to specify a whole group of characters *without* using a range. For example, this wildcard will find various kinds of punctuation:

[.,;:?!]

You may be wondering about the backslash () in front of the question and exclamation marks. The backslash tells Word to treat the following character *as* a character and not as a wildcard. (Remember, ? is the wildcard for a single character, and ! is the wildcard for "except.")

Don't be afraid to try all of these wildcard combinations and ranges for yourself (on some junk text, of course). As you experiment, you'll better understand what works and what doesn't. Then, when the need to use wildcards arises (which it will), you'll be ready.

Next week, we'll look at expression grouping and the little-known "Replace With" wildcard.

You can learn more about using numeric codes (such as that ^013 representing the carriage return) here:

http://www.topica.com/lists/editorium/read/message.html?mid=1704081834

And you can learn more about using junk text (such as "Lorem ipsum dolor sit amet") here:

http://www.topica.com/lists/editorium/read/message.html?mid=1705763701

_________________________________________

READERS WRITE

Our last newsletter used misspellings of the name "Gandhi" as an example, noting that it would be possible to use the wildcard string G[andh][andh][andh][andh]I to find the misspellings "Ghandi," "Gahndi," and "Ganhdi" all in one pass. Subscriber Glade Lyon (my dad!) wrote:

"It seems to me that your string should be G[andh][hand][ahnd][anhd]i."

Thinking that other readers might see this the same way, I'm including my response here:

I see what you're thinking--that each set of bracketed letters is an alternative spelling. No, *each set* of bracketed letters represents *one* letter in the word. [andh] will find either an "a," an "n," a "d," or an "h," whichever it comes to first. So, G[andh] will find:

Ga

Gn

Gd

or Gh

G[andh][andh] will find:

Gaa

Gan

Gad

Gah

Gna

Gnn

Gnd

Gnh

Gda

Gdn

Gdd

Gdh

Gha

Ghn

Ghd

or Ghh

And so on. So the point of using G[andh][andh][andh][andh]i is to find every possible four-letter combination of a, n, d, and h. That way, no matter *how* many ways our author has misspelled "Gandhi," we'll catch them all.

In other words, the order of the characters inside the brackets doesn't matter. The strings you suggested--

[andh]

[hand]

[ahnd]

and [anhd]

--are all functionally identical. Each one tells Word to find either an "a," an "n," a "d," or an "h."

Wildcard Combinations

Last week we discussed the basics of using wildcards to find text in a Microsoft Word document. You can read last week's newsletter here:

http://www.topica.com/lists/editorium/read/message.html?mid=1705963026

This week we'll talk about how to combine wildcards, which will let you get pretty fancy about the stuff you want to find. Basically, you just need to know that you *can* combine wildcards. Then you can get as crazy as you like.

Last week we used the "?" wildcard to find every three-letter combination starting with b and ending with t--"bet," "but," "bit," "bat," and so on--by searching for "b?t" with "Use Pattern Matching" turned on in the Find dialog box.

Now let's say we wanted to find the same characters but add others as well. For example, we might want to find every three-letter combination starting with b and ending with d--"bed," "bud," "bid," "bad," and so on--in *addition* to the combinations ending in t. Can we really do that? Sure!

After bringing up the Find dialog (Edit > Find) and turning on "Use Pattern Matching," we'll start by entering the letter b into the "Find What" box, telling Microsoft Word to find that letter.

Next, we'll enter the ? wildcard, which tells Microsoft Word to find any single character.

Finally, we'll enter a new wildcard: [td]. Microsoft Word will find any *one* of the characters specified in the brackets.

Altogether, the string of characters looks like this--

b?[td]

--and there we are, doing wildcard combinations! This particular combination tells Microsoft Word to find the letter b followed by any other single character followed by t or d.

How will something like this help you in editing? Suppose you're working on a manuscript in which the author has misspelled a name in nearly every way possible. You could comb through the manuscript over and over, hoping to catch all the variations. Or, you could be *sure* to catch them all by searching with wildcards. For example, let's say your manuscript is a book about India and the name in question is Gandhi. Your author has misspelled it as "Ghandi," "Gahndi," and "Ganhdi." (Not possible? Hah!) You can find every last one of them with the following string:

G[andh][andh][andh][andh]i

Then, if you've put the correct spelling, "Gandhi," in the "Replace With" box, you can find and replace each wrong spelling with the right one in a single pass, which is much more efficient than finding and replacing each variation separately.

You may be wondering why you couldn't just use the * wildcard to represent the whole string of letters, like this:

G*i

You could. But remember, the * wildcard represents *any* string of characters--including spaces. It's not limited to characters within a word (and neither are other wildcards). That means, in addition to finding the misspelled names, it will find the first 14 characters of the following phrase: "Go to the officer's hall." So be careful, especially if you're planning to use "Replace All" rather than finding and replacing one item at a time.

There is a way to simplify the wildcard combination, however. Consider this string:

G[andh]{3}i

It's functionally the same as G[andh][andh][andh][andh]i. The {3} tells Word to find exactly three more occurrences of the previous "expression," which is [andh].

But now a complication: Suppose that our slapdash author has also spelled Gandhi's name as "Gandi." Uh-oh. Our original string won't catch that, because this new misspelling is one character shorter than our string specifies. But consider this:

G[andh]{2,3}i

The {2,3} tells Word to find from 2 to 3 occurrences of the previous expression, so this string will catch all of our misspelled variations so far.

What if we want to allow for more or fewer characters, being particularly unsure of our author? We can use this string:

G[andh]@i

The @ wildcard tells Microsoft Word to find *one or more* occurrences of the previous expression. That ought to cover nearly anything our author throws at us. If we want to get a little more specific, we can use {2,}, which tells Word to look for *at least* two occurrences of the previous expression.

By this time you've probably noticed a pattern to these wildcards, but if not, I'll summarize:

A question mark ? finds any single character.

An asterisk * finds any string of characters.

Square brackets [] specify the characters to find.

Curly braces {} specify how many occurrences of the characters to find.

{n} finds an exact number (such as 2) of the preceding character or expression.

{n,} finds at least n occurrences (such as 3) of the preceding character or expression.

{n,n} finds from n to n occurrences (such as 3 to 5) of the preceding character or expression.

@ finds one or more occurrences of the preceding character or expression.

Here's a parting tip: What would happen if we put a lowercase rather than a capital G at the beginning of our string? Word wouldn't find the misspelled names. Why? Because with "Use Pattern Matching" turned on, Word automatically matches case--a useful thing to know.

That brings us to the subject of finding a range of characters--something we'll talk about next week.

Using Wildcards–the Basics

Subscriber Allene Goforth (agoforth@aros.net) wrote:

"I use your 'Searching with Microsoft Word's Built-In Codes' list all the time, but Word's restrictions on what codes can be used in the 'Replace with' box are a pain. I'd love to see an issue of Editorium Update that deals with wildcard searching."

Thanks for the suggestion, Allene. Here goes:

When I was in the fifth grade in wintry Idaho, rather than venturing out into the cold, some fellow students and I often spent recess playing poker. (Did our teacher know about this? I can't remember.) Being *extremely* sophisticated players, we often designated jokers, deuces, *and* one-eyed jacks as wild cards--that is, they could represent any card in the deck. With the help of these wild cards, we had plenty of royal flushes, hands with five aces, and so on. Now that was poker!

Microsoft Word, too, has a bunch of "wild cards" (which Microsoft spells as one word) that you can use to find various combinations of characters in a document. Wildcards can get pretty complicated, but this week we'll cover just the basics.

The simplest wildcard is the question mark (?), which represents any single character. If you want to see how it works, try this:

1. Open a document with some text that you can play around with.

2. Click the "Edit" menu.

3. Click "Find."

4. In the "Find What" box, enter a question mark (?).

5. Put a checkmark in the "Use Pattern Matching" box. (You may need to click the "More" button first.) Checking this box tells Microsoft Word that you're going to use a wildcard. If you didn't check the box, Microsoft Word would assume you were trying to find a question mark.

6. Click the "Find" button.

Microsoft Word will find the first character after your cursor position. Click the "Find" button again. Microsoft Word will find the next character. And so on.

That doesn't seem very useful, but let's suppose you're editing a document that was scanned from a magazine article and is riddled with typos. You notice that the word "but" shows up in various ways, including "bat" and "bet." Let's say that this is a technical article with no references to baseball, winged mammals, or games of chance, so you decide to use the ? wildcard to find "bat" and "bet" and replace them in a single pass. Here's the procedure:

1. Click the "Edit" menu.

2. Click "Replace."

3. Enter "b?t" in the "Find What" box.

4. Enter "but" in the "Replace With" box.

5. Put a checkmark in the "Use Pattern Matching Box."

6. Click the "Replace All" button.

Both "bat" and "bet" will be replaced with "but." The problem is, so will "bit." And, unfortunately, since you can't specify "Find Whole Words Only" when the "Use Pattern Matching" box is checked, Microsoft Word will replace "better" with "butter," "combat" with "combut," and who knows what else. So, instead of clicking the "Replace All" button, you should click the "Replace" button for each individual item as needed.

Now you begin to see the power--and the danger--of using wildcards. Like cutthroat poker, they are not for the faint of heart. But if you know what you're doing, they can be very useful. Unfortunately, they won't help much in the "Replace With" box. In fact, you can't use them there at all. Why? Because Word has no way of knowing what you want them to represent.

Let's say you want to find "but" and replace it with either "bet" or "bat," so you put "b?t" in the "Replace With" box and click the "Replace All" button. Word doesn't know whether you want to replace "but" with "bet" or "bat," so it just replaces it with the actual text "b?t." So, basically, the only thing you can use in the "Replace With" box is actual text or certain built-in codes, mentioned earlier. You can get the list of codes here:

http://www.topica.com/lists/editorium/read/message.html?mid=1703968584

Next week I'll explain wildcard searching in more depth. Until then, here's a list of wildcards for you to play with (on some junk text--don't use a real document):

? Finds any single character:

"c?t" finds "cat," "cut," and "cot."

* Finds any string of characters:

"b*d" finds "bad," "bread," and "bewildered."

[ ] Finds *one* of the specified characters:

"b[ai]t" finds "bat" and "bit" but not "bet."

[-] Finds any single character in the specified range (which must be in ascending order):

"[l-r]ight" finds "light," "might," "night," and "right" (and "oight," "pight," and "qight," if they exist).

[!] Finds any single character *except* those specified:

"m[!u]st" finds "mist" and "most" but not "must."

"t[!ou]ck" finds "tack" and "tick" but not "tock" or "tuck."

[!x-z] Finds any single character *except* those in the specified range:

"t[!a-m]ck" finds "tock" and "tuck" but not "tack" or "tick."

{n} Finds *exactly* n occurrences of the previous character or expression:

"re{2}d" finds "reed" but not "red."

{n,} Finds *at least* n occurrences of the previous character or expression:

"re{1,}d" finds "red" and "reed."

{n,m} Finds from n to m occurrences of the previous character or expression:

"10{1,3}" finds "10," "100," and "1000."

@ Finds one or more occurrences of the previous character or expression:

"me@t" finds "met" and "meet."

< Finds the beginning of a word:

"

Finds the end of a word:

"in>" finds "in" and "main" but not "inspiring."

Sample Text in Autotext

Last week I explained how to use Word's Rand feature to create sample text ("The quick brown fox jumps over the lazy dog") that you can use for various purposes. You can read last week's newsletter here:

http://www.topica.com/lists/editorium/read/message.html?mid=1705763701

I neglected to mention that for the Rand feature to work, "Replace text as you type" must be turned on under Tools > AutoCorrect. If you tried using Rand but nothing happened, you don't have it turned on. Of course, you may not *want* it turned on because then Word automatically makes certain "corrections" that you may not want. If you're editing in Word, that can be a disaster. For more information on how to prevent such problems, see "When Word Gets in the Way" in the very first issue of Editorium Update:

http://www.topica.com/lists/editorium/read/message.html?mid=1700237543

If you turn off "Replace text as you type," you can still use the traditional "Lorem ipsum dolor sit amet" sample text included in last week's newsletter. Subscriber Karen L. Bojda of Bojda Editorial & Writing Services (http://www.bojda.f2s.com) sent this helpful suggestion for doing so:

"Depending on your layout, repeating the 'quick brown fox' creates columns of words and rivers instead of a nice sample layout. So I just made an AutoText entry for the 'Lorem' text, which works whether AutoCorrect is on or not."

Thinking that this was a great idea, I immediately followed suit. Now, whenever I need some sample text to work with, I just type the word "lorem" into my document and press the F3 key. Presto! If you'd like to do this, here's how to set it up:

1. Copy and paste the "Lorem" text into a Word document (I've included a nice, long version at the end of this article).

2. Select the "Lorem" text.

3. In Word 97 or later, click the "Insert" menu at the top of your Word screen. In Word 95 or earlier, click the "Edit" menu.

4. Click "AutoText."

5. In Word 97 or later, click "New."

6. In the box labeled "Please name your AutoText entry" (just "Name" in Word 95 or earlier), type "lorem."

7. In Word 95 or earlier, make sure the box labeled "Make AutoText Entry Available To" shows "All Documents (Normal.dot)."

8. In Word 97 or later, click the "OK" button. In Word 95 or earlier, click the "Add" button.

Now, when you need some sample text, do this:

1. Type "lorem" into your document.

2. Press the F3 key.

The "Lorem" text will be inserted into your document.

Karen also sent this caution: "If you're going to address AutoText entries in an upcoming newsletter, I found the way Word files them by style to be at first baffling and then annoying, and I think a heads-up about that would be worthwhile. I avoid adding AutoText entries casually. Instead, I first create a style that has a meaningful name, such as 'sample text' or 'math symbols.' Then I format the text I want to add using that style, so that the AutoText entry gets filed under a heading that is more meaningful than 'Normal' or 'Body Text.' The style itself can then be deleted."

Many thanks to Karen for this useful information.

Here's a three-paragraph version of the "Lorem" text that you can use to create an AutoText entry (after deleting the extraneous email carriage returns at the ends of the lines):

Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.

Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.

Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat.

Sample Text

Working in Microsoft Word, I often need some "junk" text to play around with, for various reasons:

* I'm designing a document and don't want to get bogged down in what the text actually says.

* I'm creating a template with various paragraph styles and need to see what they will look like.

* I'm creating a macro and need some text for testing purposes.

* I'm trying to learn more about some feature of Microsoft Word and don't want to practice on a real document.

Microsoft Word 97, 98, 2000, and 2001 include an undocumented feature that generates all of the sample text I need. Maybe you'll find it helpful too. To use it, type the following line into a Word document and press the ENTER key:

=Rand(1,1)

Word will insert the following text into your document:

The quick brown fox jumps over the lazy dog.

(As you probably know, this sentence includes every letter in the alphabet and is sometimes used for typing practice.)

Need more than one sentence? You can specify how many sentences you need by changing the last number in the Rand statement. For example, if you needed five sentences, you could type this--

=Rand(1,5)

--which would produce this:

The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.

Need more than one paragraph? You can specify how many paragraphs you need by changing the first number in the Rand statement. For example, if you needed two paragraphs (with five sentences in each one), you could type this--

=Rand(2,5)

--which would produce this:

The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.

The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.

In other words, the first number specifies the number of paragraphs you want to insert; the second number specifies the number of sentences you want to include in those paragraphs.

If you're using Word 95 or lower (or if you're tired of that quick brown fox), you can use the traditional Latin "Lorem ipsum dolor . . . ," which has been used as placeholder text for centuries:

Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exercitation ulliam corper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem veleum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel willum lunombro dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.

If you're curious about this, it's a garbled quotation from Cicero's De Finibus Bonorum et Malorum (On the Ends of Good and Bad), book 1, paragraph 32, which reads, "Neque porro quisquam est, qui dolorem ipsum, quia dolor sit, amet, consectetur, adipisci velit," meaning, "There is no one who loves pain itself, who seeks after it and wants to have it, simply because it is pain." The book was popular during the Renaissance, when the passage was used in a book of type samples for that wonderful new technology, printing.

If your Latin is good enough (unlike mine), you can read Cicero's complete text (or just get a whole bunch of great sample text) here:

http://patriot.net/~lillard/cp/cic.fin.html

If you want to see a beautiful collection of classic type samples, check out Giambattista Bodoni's Typographic Manual at Octavo:

http://www.octavo.com/collection/bodtip.html

And for more information on sample text, see Jacci Howard Bear's article at About.com:

http://www.desktoppub.about.com/compute/desktoppub/library/weekly/aa051199.htm

Quark to Word

This week subscriber Doug Clapp, proprietor of PocketPCpress (http://www.pocketpcpress.com/), wrote with an interesting question. He'd received a book that had been typeset in QuarkXPress (Doug didn't have QuarkXPress) and sent to him as a "stuffed" (.sit) Macintosh file (Doug didn't have the StuffIt program or a Macintosh). What Doug *needed* was an unstuffed Microsoft Word document that he could use on his PC.

If you're ever in the same predicament, there *is* a way out. Even better, it's (relatively) easy, and it's free!

First, you'll need StuffIt for Windows, which will "unstuff" that stuffed file. (StuffIt is a file compression program similar to WinZip.) You can download a trial version here:

http://www.aladdinsys.com/stuffitwin/index.html

When you install the program, it will ask if you have a "serial number" password, but you can click "No" to install in "demo mode." Then you can use the program free of charge for 30 days. (After that, you can register the program for a reasonable price if you want to keep using it.)

To unstuff the file, simply drag and drop it to the "Aladdin Expander" icon on your Windows desktop. The unstuffed file will then appear on your desktop as well.

Next, you'll need to convert the unstuffed file from QuarkXPress to Word. That means you'll need the QuarkXPress 4.1 Demo program for Windows, which you can download here:

http://www.quark.com/products/xpress/demos.html

The name of the program to download is "QuarkXPress and QuarkXPress Passport 4.1 Demo (Win)." The download page explains that the "Save" function of the demo program has been disabled, but don't worry about that. To download and install the program, read and follow the instructions here:

http://www.quark.com/support/downloads/instructions.html

After you've installed the QuarkXPress demo, follow this procedure:

1. Start the QuarkXPress demo.

2. Click the "File" menu.

3. Click "Open."

4. Find and open the unstuffed file that you want to convert to a Word document.

5. Click the "File" menu.

6. Click "Save Text" (which is different from the disabled "Save").

7. Save the text as a Word document, which will preserve styles and other formatting.

And there you have it! Now you can open the file in Microsoft Word and do what you need to do.

The downside to getting the QuarkXPress demo is that it's 23 megs. If you have fast Internet access, no problem. On a slow modem, though, the download may take several hours. An alternative is to request a demo CD from Quark, which you can do at their Web site. The QuarkXPress demo will run forever, but you can't use it to save QuarkXPress documents. You *can,* however, use it as a wonderful Quark-to-Word converter whenever the need arises.

_________________________________________

READERS WRITE

Subscriber Dwight Purdy sent information about a program that you may find useful if, like me, you're prone to hitting certain keys accidentally:

"While reviewing some of our long-ago discussions, I decided to go back to www.Phoebusnet.com to see if there was anything happening to their sMaRTcaPs program. As it turns out, there are some things which they have done with it, including branching out to your personal nemesis, the Insert key. The price for this gem is now $5.00. I couldn't resist that, so I downloaded it. If you hit the insert key, it tells you so! Ditto for Caps Lock and Num Lock, and all of them also respond audibly to holding them down for a moment. I haven't had time to explore what other little extras might be there, but this is a 'must have'."

Thanks for the tip, Dwight.