Wildcard Ranges

Last week we discussed using wildcard combinations to find text in a Microsoft Word document. You can read last week's newsletter here:

http://www.topica.com/lists/editorium/read/message.html?mid=1706069286

This week we'll talk about wildcard ranges, which you'll probably use a lot.

Wildcard ranges are fairly simple. You just use the [-] wildcard to tell Microsoft Word what to find. Let's continue with our example from last week:

b?[td]

As you probably recall, this tells Word to find the letter b followed by any single character followed by either t or d. In other words, it will find "bet," "but," "bit," "bat," "bed," "bud," "bid," "bad," and so on.

But what if we wanted to find "bat," "bad," "bet," and "bed" but NOT "bit," "bid," "bud," and "but"? After bringing up the Find dialog (Edit > Find) and turning on "Use Pattern Matching" (you may need to click the "More" button before this is available), we could use this wildcard combination in the "Find What" box:

b[a-e][td]

This tells Word to find the letter b followed by any letter from a to e (in other words, a, b, c, d, or e) followed by t or d. (The range *must* be in ascending order--in other words, from a "lower" letter [such as a] to a "higher" letter [such as z].)

Here's another way to approach this:

b[!f-z][td]

Notice the exclamation mark at the front of the "range" wildcard. The exclamation mark tells Word to find every character *except* those specified--in this case, the letters f through z. This wildcard combination, too, will find "bat," "bad," "bet," and "bed" but not "bit," "bid," "bud," and "but."

Here's a range that I use all the time:

[0-9]

This little beauty finds any occurrence of a digit. What's that good for? Let's say you're editing a document with lots of numbered lists, like this:

1. Lorem ipsum dolor sit amet.

2 Ut wisi enim ad minim veniam.

3. Duis autem vel eum iriure dolor.

Did you notice that the number 2 has no period? Good! You must have "the eye." But if you have several long lists, you might want to let Word find these problem numbers for you. To do so, try this wildcard string:

^013[0-9]@[!.]

Pretty cryptic. But if you've been reading Editorium Update, you can probably figure this out:

^013 is the numeric code for a carriage return.

[0-9] represents any digit.

@ tells Word to find one or more occurrences of the previous expression (in this case, any digit). This is necessary in case you have lists with two-digit (or longer) numbers.

[!.] tells Word to find any character *except* a period.

Piece of cake.

Here are two other wildcard ranges you might find useful:

[a-z] represents any occurrence of a lowercase letter.

[A-Z] represents any occurrence of an uppercase letter.

Remember, too, that you can use the [] wildcard (without a hyphen) to specify a whole group of characters *without* using a range. For example, this wildcard will find various kinds of punctuation:

[.,;:?!]

You may be wondering about the backslash () in front of the question and exclamation marks. The backslash tells Word to treat the following character *as* a character and not as a wildcard. (Remember, ? is the wildcard for a single character, and ! is the wildcard for "except.")

Don't be afraid to try all of these wildcard combinations and ranges for yourself (on some junk text, of course). As you experiment, you'll better understand what works and what doesn't. Then, when the need to use wildcards arises (which it will), you'll be ready.

Next week, we'll look at expression grouping and the little-known "Replace With" wildcard.

You can learn more about using numeric codes (such as that ^013 representing the carriage return) here:

http://www.topica.com/lists/editorium/read/message.html?mid=1704081834

And you can learn more about using junk text (such as "Lorem ipsum dolor sit amet") here:

http://www.topica.com/lists/editorium/read/message.html?mid=1705763701

_________________________________________

READERS WRITE

Our last newsletter used misspellings of the name "Gandhi" as an example, noting that it would be possible to use the wildcard string G[andh][andh][andh][andh]I to find the misspellings "Ghandi," "Gahndi," and "Ganhdi" all in one pass. Subscriber Glade Lyon (my dad!) wrote:

"It seems to me that your string should be G[andh][hand][ahnd][anhd]i."

Thinking that other readers might see this the same way, I'm including my response here:

I see what you're thinking--that each set of bracketed letters is an alternative spelling. No, *each set* of bracketed letters represents *one* letter in the word. [andh] will find either an "a," an "n," a "d," or an "h," whichever it comes to first. So, G[andh] will find:

Ga

Gn

Gd

or Gh

G[andh][andh] will find:

Gaa

Gan

Gad

Gah

Gna

Gnn

Gnd

Gnh

Gda

Gdn

Gdd

Gdh

Gha

Ghn

Ghd

or Ghh

And so on. So the point of using G[andh][andh][andh][andh]i is to find every possible four-letter combination of a, n, d, and h. That way, no matter *how* many ways our author has misspelled "Gandhi," we'll catch them all.

In other words, the order of the characters inside the brackets doesn't matter. The strings you suggested--

[andh]

[hand]

[ahnd]

and [anhd]

--are all functionally identical. Each one tells Word to find either an "a," an "n," a "d," or an "h."

This entry was posted in Editing. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

You must be logged in to post a comment.

  • The Fine Print

    Thanks for reading Editorium Update (ISSN 1534-1283), published by:

    The EDITORIUM, LLC
    http://www.editorium.com

    Articles © on date of publication by the Editorium. All rights reserved. Editorium Update and Editorium are trademarks of the Editorium.

    You may forward copies of Editorium Update to others (but not charge for it) and print or store it for your personal use. Any other broadcast, publication, retransmission, copying, or storage, without written permission from the Editorium, is strictly prohibited. If you’re interested in reprinting one of our articles, please send an email message to editor@editorium.com

    Editorium Update is provided for informational purposes only and without a warranty of any kind, either express or implied, including but not limited to implied warranties of merchantability, fitness for a particular purpose, and freedom from infringement. The user (you) assumes the entire risk as to the accuracy and use of this document.

    The Editorium is not affiliated with Microsoft Corporation or any other entity.

    We do not sell, rent, or give our subscriber list to anyone. Period.

    If you’d like to subscribe, please enter your name and email address below. We publish the newsletter once a week, and on rare occasions we may send an important announcement. We never, ever send spam. Thank you for signing up!