Last week we discussed using wildcard combinations to find text in a Microsoft Word document. You can read last week's newsletter here:
http://www.topica.com/lists/editorium/read/message.html?mid=1706069286
This week we'll talk about wildcard ranges, which you'll probably use a lot.
Wildcard ranges are fairly simple. You just use the [-] wildcard to tell Microsoft Word what to find. Let's continue with our example from last week:
b?[td]
As you probably recall, this tells Word to find the letter b followed by any single character followed by either t or d. In other words, it will find "bet," "but," "bit," "bat," "bed," "bud," "bid," "bad," and so on.
But what if we wanted to find "bat," "bad," "bet," and "bed" but NOT "bit," "bid," "bud," and "but"? After bringing up the Find dialog (Edit > Find) and turning on "Use Pattern Matching" (you may need to click the "More" button before this is available), we could use this wildcard combination in the "Find What" box:
b[a-e][td]
This tells Word to find the letter b followed by any letter from a to e (in other words, a, b, c, d, or e) followed by t or d. (The range *must* be in ascending order--in other words, from a "lower" letter [such as a] to a "higher" letter [such as z].)
Here's another way to approach this:
b[!f-z][td]
Notice the exclamation mark at the front of the "range" wildcard. The exclamation mark tells Word to find every character *except* those specified--in this case, the letters f through z. This wildcard combination, too, will find "bat," "bad," "bet," and "bed" but not "bit," "bid," "bud," and "but."
Here's a range that I use all the time:
[0-9]
This little beauty finds any occurrence of a digit. What's that good for? Let's say you're editing a document with lots of numbered lists, like this:
1. Lorem ipsum dolor sit amet.
2 Ut wisi enim ad minim veniam.
3. Duis autem vel eum iriure dolor.
Did you notice that the number 2 has no period? Good! You must have "the eye." But if you have several long lists, you might want to let Word find these problem numbers for you. To do so, try this wildcard string:
^013[0-9]@[!.]
Pretty cryptic. But if you've been reading Editorium Update, you can probably figure this out:
^013 is the numeric code for a carriage return.
[0-9] represents any digit.
@ tells Word to find one or more occurrences of the previous expression (in this case, any digit). This is necessary in case you have lists with two-digit (or longer) numbers.
[!.] tells Word to find any character *except* a period.
Piece of cake.
Here are two other wildcard ranges you might find useful:
[a-z] represents any occurrence of a lowercase letter.
[A-Z] represents any occurrence of an uppercase letter.
Remember, too, that you can use the [] wildcard (without a hyphen) to specify a whole group of characters *without* using a range. For example, this wildcard will find various kinds of punctuation:
[.,;:?!]
You may be wondering about the backslash () in front of the question and exclamation marks. The backslash tells Word to treat the following character *as* a character and not as a wildcard. (Remember, ? is the wildcard for a single character, and ! is the wildcard for "except.")
Don't be afraid to try all of these wildcard combinations and ranges for yourself (on some junk text, of course). As you experiment, you'll better understand what works and what doesn't. Then, when the need to use wildcards arises (which it will), you'll be ready.
Next week, we'll look at expression grouping and the little-known "Replace With" wildcard.
You can learn more about using numeric codes (such as that ^013 representing the carriage return) here:
http://www.topica.com/lists/editorium/read/message.html?mid=1704081834
And you can learn more about using junk text (such as "Lorem ipsum dolor sit amet") here:
http://www.topica.com/lists/editorium/read/message.html?mid=1705763701
_________________________________________
READERS WRITE
Our last newsletter used misspellings of the name "Gandhi" as an example, noting that it would be possible to use the wildcard string G[andh][andh][andh][andh]I to find the misspellings "Ghandi," "Gahndi," and "Ganhdi" all in one pass. Subscriber Glade Lyon (my dad!) wrote:
"It seems to me that your string should be G[andh][hand][ahnd][anhd]i."
Thinking that other readers might see this the same way, I'm including my response here:
I see what you're thinking--that each set of bracketed letters is an alternative spelling. No, *each set* of bracketed letters represents *one* letter in the word. [andh] will find either an "a," an "n," a "d," or an "h," whichever it comes to first. So, G[andh] will find:
Ga
Gn
Gd
or Gh
G[andh][andh] will find:
Gaa
Gan
Gad
Gah
Gna
Gnn
Gnd
Gnh
Gda
Gdn
Gdd
Gdh
Gha
Ghn
Ghd
or Ghh
And so on. So the point of using G[andh][andh][andh][andh]i is to find every possible four-letter combination of a, n, d, and h. That way, no matter *how* many ways our author has misspelled "Gandhi," we'll catch them all.
In other words, the order of the characters inside the brackets doesn't matter. The strings you suggested--
[andh]
[hand]
[ahnd]
and [anhd]
--are all functionally identical. Each one tells Word to find either an "a," an "n," a "d," or an "h."