Wildcard Combinations

Last week we discussed the basics of using wildcards to find text in a Microsoft Word document. You can read last week's newsletter here:

http://www.topica.com/lists/editorium/read/message.html?mid=1705963026

This week we'll talk about how to combine wildcards, which will let you get pretty fancy about the stuff you want to find. Basically, you just need to know that you *can* combine wildcards. Then you can get as crazy as you like.

Last week we used the "?" wildcard to find every three-letter combination starting with b and ending with t--"bet," "but," "bit," "bat," and so on--by searching for "b?t" with "Use Pattern Matching" turned on in the Find dialog box.

Now let's say we wanted to find the same characters but add others as well. For example, we might want to find every three-letter combination starting with b and ending with d--"bed," "bud," "bid," "bad," and so on--in *addition* to the combinations ending in t. Can we really do that? Sure!

After bringing up the Find dialog (Edit > Find) and turning on "Use Pattern Matching," we'll start by entering the letter b into the "Find What" box, telling Microsoft Word to find that letter.

Next, we'll enter the ? wildcard, which tells Microsoft Word to find any single character.

Finally, we'll enter a new wildcard: [td]. Microsoft Word will find any *one* of the characters specified in the brackets.

Altogether, the string of characters looks like this--

b?[td]

--and there we are, doing wildcard combinations! This particular combination tells Microsoft Word to find the letter b followed by any other single character followed by t or d.

How will something like this help you in editing? Suppose you're working on a manuscript in which the author has misspelled a name in nearly every way possible. You could comb through the manuscript over and over, hoping to catch all the variations. Or, you could be *sure* to catch them all by searching with wildcards. For example, let's say your manuscript is a book about India and the name in question is Gandhi. Your author has misspelled it as "Ghandi," "Gahndi," and "Ganhdi." (Not possible? Hah!) You can find every last one of them with the following string:

G[andh][andh][andh][andh]i

Then, if you've put the correct spelling, "Gandhi," in the "Replace With" box, you can find and replace each wrong spelling with the right one in a single pass, which is much more efficient than finding and replacing each variation separately.

You may be wondering why you couldn't just use the * wildcard to represent the whole string of letters, like this:

G*i

You could. But remember, the * wildcard represents *any* string of characters--including spaces. It's not limited to characters within a word (and neither are other wildcards). That means, in addition to finding the misspelled names, it will find the first 14 characters of the following phrase: "Go to the officer's hall." So be careful, especially if you're planning to use "Replace All" rather than finding and replacing one item at a time.

There is a way to simplify the wildcard combination, however. Consider this string:

G[andh]{3}i

It's functionally the same as G[andh][andh][andh][andh]i. The {3} tells Word to find exactly three more occurrences of the previous "expression," which is [andh].

But now a complication: Suppose that our slapdash author has also spelled Gandhi's name as "Gandi." Uh-oh. Our original string won't catch that, because this new misspelling is one character shorter than our string specifies. But consider this:

G[andh]{2,3}i

The {2,3} tells Word to find from 2 to 3 occurrences of the previous expression, so this string will catch all of our misspelled variations so far.

What if we want to allow for more or fewer characters, being particularly unsure of our author? We can use this string:

G[andh]@i

The @ wildcard tells Microsoft Word to find *one or more* occurrences of the previous expression. That ought to cover nearly anything our author throws at us. If we want to get a little more specific, we can use {2,}, which tells Word to look for *at least* two occurrences of the previous expression.

By this time you've probably noticed a pattern to these wildcards, but if not, I'll summarize:

A question mark ? finds any single character.

An asterisk * finds any string of characters.

Square brackets [] specify the characters to find.

Curly braces {} specify how many occurrences of the characters to find.

{n} finds an exact number (such as 2) of the preceding character or expression.

{n,} finds at least n occurrences (such as 3) of the preceding character or expression.

{n,n} finds from n to n occurrences (such as 3 to 5) of the preceding character or expression.

@ finds one or more occurrences of the preceding character or expression.

Here's a parting tip: What would happen if we put a lowercase rather than a capital G at the beginning of our string? Word wouldn't find the misspelled names. Why? Because with "Use Pattern Matching" turned on, Word automatically matches case--a useful thing to know.

That brings us to the subject of finding a range of characters--something we'll talk about next week.

Using Wildcards–the Basics

Subscriber Allene Goforth (agoforth@aros.net) wrote:

"I use your 'Searching with Microsoft Word's Built-In Codes' list all the time, but Word's restrictions on what codes can be used in the 'Replace with' box are a pain. I'd love to see an issue of Editorium Update that deals with wildcard searching."

Thanks for the suggestion, Allene. Here goes:

When I was in the fifth grade in wintry Idaho, rather than venturing out into the cold, some fellow students and I often spent recess playing poker. (Did our teacher know about this? I can't remember.) Being *extremely* sophisticated players, we often designated jokers, deuces, *and* one-eyed jacks as wild cards--that is, they could represent any card in the deck. With the help of these wild cards, we had plenty of royal flushes, hands with five aces, and so on. Now that was poker!

Microsoft Word, too, has a bunch of "wild cards" (which Microsoft spells as one word) that you can use to find various combinations of characters in a document. Wildcards can get pretty complicated, but this week we'll cover just the basics.

The simplest wildcard is the question mark (?), which represents any single character. If you want to see how it works, try this:

1. Open a document with some text that you can play around with.

2. Click the "Edit" menu.

3. Click "Find."

4. In the "Find What" box, enter a question mark (?).

5. Put a checkmark in the "Use Pattern Matching" box. (You may need to click the "More" button first.) Checking this box tells Microsoft Word that you're going to use a wildcard. If you didn't check the box, Microsoft Word would assume you were trying to find a question mark.

6. Click the "Find" button.

Microsoft Word will find the first character after your cursor position. Click the "Find" button again. Microsoft Word will find the next character. And so on.

That doesn't seem very useful, but let's suppose you're editing a document that was scanned from a magazine article and is riddled with typos. You notice that the word "but" shows up in various ways, including "bat" and "bet." Let's say that this is a technical article with no references to baseball, winged mammals, or games of chance, so you decide to use the ? wildcard to find "bat" and "bet" and replace them in a single pass. Here's the procedure:

1. Click the "Edit" menu.

2. Click "Replace."

3. Enter "b?t" in the "Find What" box.

4. Enter "but" in the "Replace With" box.

5. Put a checkmark in the "Use Pattern Matching Box."

6. Click the "Replace All" button.

Both "bat" and "bet" will be replaced with "but." The problem is, so will "bit." And, unfortunately, since you can't specify "Find Whole Words Only" when the "Use Pattern Matching" box is checked, Microsoft Word will replace "better" with "butter," "combat" with "combut," and who knows what else. So, instead of clicking the "Replace All" button, you should click the "Replace" button for each individual item as needed.

Now you begin to see the power--and the danger--of using wildcards. Like cutthroat poker, they are not for the faint of heart. But if you know what you're doing, they can be very useful. Unfortunately, they won't help much in the "Replace With" box. In fact, you can't use them there at all. Why? Because Word has no way of knowing what you want them to represent.

Let's say you want to find "but" and replace it with either "bet" or "bat," so you put "b?t" in the "Replace With" box and click the "Replace All" button. Word doesn't know whether you want to replace "but" with "bet" or "bat," so it just replaces it with the actual text "b?t." So, basically, the only thing you can use in the "Replace With" box is actual text or certain built-in codes, mentioned earlier. You can get the list of codes here:

http://www.topica.com/lists/editorium/read/message.html?mid=1703968584

Next week I'll explain wildcard searching in more depth. Until then, here's a list of wildcards for you to play with (on some junk text--don't use a real document):

? Finds any single character:

"c?t" finds "cat," "cut," and "cot."

* Finds any string of characters:

"b*d" finds "bad," "bread," and "bewildered."

[ ] Finds *one* of the specified characters:

"b[ai]t" finds "bat" and "bit" but not "bet."

[-] Finds any single character in the specified range (which must be in ascending order):

"[l-r]ight" finds "light," "might," "night," and "right" (and "oight," "pight," and "qight," if they exist).

[!] Finds any single character *except* those specified:

"m[!u]st" finds "mist" and "most" but not "must."

"t[!ou]ck" finds "tack" and "tick" but not "tock" or "tuck."

[!x-z] Finds any single character *except* those in the specified range:

"t[!a-m]ck" finds "tock" and "tuck" but not "tack" or "tick."

{n} Finds *exactly* n occurrences of the previous character or expression:

"re{2}d" finds "reed" but not "red."

{n,} Finds *at least* n occurrences of the previous character or expression:

"re{1,}d" finds "red" and "reed."

{n,m} Finds from n to m occurrences of the previous character or expression:

"10{1,3}" finds "10," "100," and "1000."

@ Finds one or more occurrences of the previous character or expression:

"me@t" finds "met" and "meet."

< Finds the beginning of a word:

"

Finds the end of a word:

"in>" finds "in" and "main" but not "inspiring."

Sample Text in Autotext

Last week I explained how to use Word's Rand feature to create sample text ("The quick brown fox jumps over the lazy dog") that you can use for various purposes. You can read last week's newsletter here:

http://www.topica.com/lists/editorium/read/message.html?mid=1705763701

I neglected to mention that for the Rand feature to work, "Replace text as you type" must be turned on under Tools > AutoCorrect. If you tried using Rand but nothing happened, you don't have it turned on. Of course, you may not *want* it turned on because then Word automatically makes certain "corrections" that you may not want. If you're editing in Word, that can be a disaster. For more information on how to prevent such problems, see "When Word Gets in the Way" in the very first issue of Editorium Update:

http://www.topica.com/lists/editorium/read/message.html?mid=1700237543

If you turn off "Replace text as you type," you can still use the traditional "Lorem ipsum dolor sit amet" sample text included in last week's newsletter. Subscriber Karen L. Bojda of Bojda Editorial & Writing Services (http://www.bojda.f2s.com) sent this helpful suggestion for doing so:

"Depending on your layout, repeating the 'quick brown fox' creates columns of words and rivers instead of a nice sample layout. So I just made an AutoText entry for the 'Lorem' text, which works whether AutoCorrect is on or not."

Thinking that this was a great idea, I immediately followed suit. Now, whenever I need some sample text to work with, I just type the word "lorem" into my document and press the F3 key. Presto! If you'd like to do this, here's how to set it up:

1. Copy and paste the "Lorem" text into a Word document (I've included a nice, long version at the end of this article).

2. Select the "Lorem" text.

3. In Word 97 or later, click the "Insert" menu at the top of your Word screen. In Word 95 or earlier, click the "Edit" menu.

4. Click "AutoText."

5. In Word 97 or later, click "New."

6. In the box labeled "Please name your AutoText entry" (just "Name" in Word 95 or earlier), type "lorem."

7. In Word 95 or earlier, make sure the box labeled "Make AutoText Entry Available To" shows "All Documents (Normal.dot)."

8. In Word 97 or later, click the "OK" button. In Word 95 or earlier, click the "Add" button.

Now, when you need some sample text, do this:

1. Type "lorem" into your document.

2. Press the F3 key.

The "Lorem" text will be inserted into your document.

Karen also sent this caution: "If you're going to address AutoText entries in an upcoming newsletter, I found the way Word files them by style to be at first baffling and then annoying, and I think a heads-up about that would be worthwhile. I avoid adding AutoText entries casually. Instead, I first create a style that has a meaningful name, such as 'sample text' or 'math symbols.' Then I format the text I want to add using that style, so that the AutoText entry gets filed under a heading that is more meaningful than 'Normal' or 'Body Text.' The style itself can then be deleted."

Many thanks to Karen for this useful information.

Here's a three-paragraph version of the "Lorem" text that you can use to create an AutoText entry (after deleting the extraneous email carriage returns at the ends of the lines):

Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.

Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.

Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat.

Sample Text

Working in Microsoft Word, I often need some "junk" text to play around with, for various reasons:

* I'm designing a document and don't want to get bogged down in what the text actually says.

* I'm creating a template with various paragraph styles and need to see what they will look like.

* I'm creating a macro and need some text for testing purposes.

* I'm trying to learn more about some feature of Microsoft Word and don't want to practice on a real document.

Microsoft Word 97, 98, 2000, and 2001 include an undocumented feature that generates all of the sample text I need. Maybe you'll find it helpful too. To use it, type the following line into a Word document and press the ENTER key:

=Rand(1,1)

Word will insert the following text into your document:

The quick brown fox jumps over the lazy dog.

(As you probably know, this sentence includes every letter in the alphabet and is sometimes used for typing practice.)

Need more than one sentence? You can specify how many sentences you need by changing the last number in the Rand statement. For example, if you needed five sentences, you could type this--

=Rand(1,5)

--which would produce this:

The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.

Need more than one paragraph? You can specify how many paragraphs you need by changing the first number in the Rand statement. For example, if you needed two paragraphs (with five sentences in each one), you could type this--

=Rand(2,5)

--which would produce this:

The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.

The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.

In other words, the first number specifies the number of paragraphs you want to insert; the second number specifies the number of sentences you want to include in those paragraphs.

If you're using Word 95 or lower (or if you're tired of that quick brown fox), you can use the traditional Latin "Lorem ipsum dolor . . . ," which has been used as placeholder text for centuries:

Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exercitation ulliam corper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem veleum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel willum lunombro dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.

If you're curious about this, it's a garbled quotation from Cicero's De Finibus Bonorum et Malorum (On the Ends of Good and Bad), book 1, paragraph 32, which reads, "Neque porro quisquam est, qui dolorem ipsum, quia dolor sit, amet, consectetur, adipisci velit," meaning, "There is no one who loves pain itself, who seeks after it and wants to have it, simply because it is pain." The book was popular during the Renaissance, when the passage was used in a book of type samples for that wonderful new technology, printing.

If your Latin is good enough (unlike mine), you can read Cicero's complete text (or just get a whole bunch of great sample text) here:

http://patriot.net/~lillard/cp/cic.fin.html

If you want to see a beautiful collection of classic type samples, check out Giambattista Bodoni's Typographic Manual at Octavo:

http://www.octavo.com/collection/bodtip.html

And for more information on sample text, see Jacci Howard Bear's article at About.com:

http://www.desktoppub.about.com/compute/desktoppub/library/weekly/aa051199.htm