Wildcard Searching and Multiple Paragraph Breaks

In trying to explain how Word's Find and Replace (FnR) wildcard mechanism works, I'll also present a practical solution to the multitude of problems encountered by the seemingly innocuous ^p^p to ^p, whose usual objective is to remove unnecessary blank lines. In doing so, we shall traverse the width of Word's pitfalls that never fail to trip up a traveller.

First up, the Word Help system has some excellent help on wildcards. It is a complete pain to access, but you can find something. In Word 2000: F1 - help > Answer Wizard | Index > Search on: wildcard. The second topic down is the master list of all FnR stuff. Select it. Pick the Wildcard Characters topic down that list. Now select the *type a wildcard* hyperlink. Hooray! Print it out, fast, and use it as a guide from now on. You have just found the first excellent Quick Reference in the Help system.

The very last two paragraphs are the key to what I am attempting here.

To replace double paragraph breaks with a single one, we would think that finding ^p^p and replacing with ^p would do the job, right?

Well, not really. If you do it via VBA, you find yourself stalling forever if your document is terminated by a blank paragraph, as you have to perform it iteratively until you get a Not Found condition. Why does it fail to replace the last paragraph mark? Well, you *can't* delete the last paragraph mark--ever. When you a start a brand new virgin document and turn on View Formatting, that paragraph mark you see is the End Of Document paragraph mark. As the document does exist and thus has a finite end point, that magic pilcrow (backward P) has to appear. It is also the marker point in memory to place the nasty little objects we infest our nice clean ASCII text with: style definitions, table formatting, list templates, graphical objects, and the list goes on. See Alt + F11 > F2 > Enter for more information.

So, to get around the VBA problem, we simply pre-process the final paragraph. If it is blank (just a paragraph mark), then kill the second-to-last character--which must be the penultimate paragraph mark. Manually, press CTRL + END and use the backspace key as often as required.

The main problem with the simple FnR replace postulation is similar. If you just delete a paragraph mark, you lose the style for that paragraph. So, we can get around that by ensuring it is always the trailing paragraph that gets deleted. It won't do the final blank paragraph in a document, but that is solved above.

First, we need to understand how the brackets work, and the Help topic explains that nicely. So let us put the guide into good use. (^p)^p means that we have marked the first paragraph mark as our first "text chunk." If we use 1 in the replace string, it means to leave the first text chunk--the paragraph mark with the holy styling applied--in place. Unfortunately for us, we still haven't got there yet.

Why? We get an error: We can't use ^p if we are using wildcards. ($#%*! Microsoft.) So we have to use ^013 instead. Herein lies our next problem--paragraph marks that aren't! Oh, yes, kiddies, just because you see a pilcrow does not mean you are looking at a paragraph mark. Oh no. Not with Paste Special and even weirder applications handing in Clipboard data streams without thought. Word dutifully displays a pilcrow when it encounters an ASCII 013, but the background machinery may not have resolved into a paragraph object to be kept dynamically updated.

How do I know it is ASCII 013? Well, I cheat. I select the paragraph mark, or whatever character I need to know, and use VBA: Alt + F11 (VB Editor or the VBE), CTRL + G (Immediate Window), Enter: ? ASCW(Selection)

I use ASCW() rather than ASC() because I want the full Unicode value. For ASCII characters, the Unicode value is the same. Go ahead, work out the wildcards' ASCII numbers and write them on your guide.

So, if we are going to use replace (^013)^013 with ^013, we have to make sure every ASCII 13 is a bona fide paragraph mark. Without wildcards on, find ^013 and replace it with ^p. Honest paragraphs will see no change; fake paragraphs get converted to your will on the spot.

Now you can get serious and stick your wildcard search on. Replace (^013)^013 with 1 and we're in the clear. Done.

In a similar fashion, the much simpler exercise of replacing a colon that occurs after a ket--a ")" character--without destroying the ket itself, would be to use wildcards and replace (^041)^058 with 1.

However, if we were searching for a bra--a "(" character--we would run into another peculiar little Word problem with managing RTF strings. If you insert a symbol from the Wingdings range, or many other non-Unicode graphical fonts, Word actually stores a marker there instead, and then it stores the actual font character off beyond the end of the section mark. That marker is ASCII 40, our unfortunate bra. So an ^040^058 sequence could very well be *any* symbol followed by a colon.

If we were using two blank paragraphs before every heading and no space before to ensure that our new pages always start at the very top, no matter the method used to page break, and we wanted to get rid of scads of three or more blank paragraphs in excess of a single hit (are we listening, VBA people?), we could do something evil and wicked like this: find (^013{2,2})(^013)@ and replace it with 1. That leaves us with a maximum of two following blank paragraphs anywhere in the document, even at the end--in one single Find operation.

Interestingly enough, for those still able to follow, (^013{2,2})^013{1,} fails with an invalid pattern. I forced it with the brackets for the above solution.

That brings us to the final solution for editors and writers seeking to mass destroy all blank lines. It has taken a while, but boy haven't we learned a lot of useless stuff about Word on the way! Find (^013)(^013)@ and replace it with 1 to kill all blank paragraphs in a single pass, with the exception of the first paragraph (there is no start-of-document paragraph mark to give us a two-in-a-row target) and the last paragraph mark (which is excluded from the Find range).

Copyright 2005 by Steve Hudson. All rights reserved.

Word Heretic, Sydney, Australia

Tricky stuff with Word or words for you.

www.wordheretic.com

ABN: 86 453 419 554

"Qualified Good Tech Writer Dude"

Free Association of Words Without prejudice

_________________________________________

READERS WRITE

In the August 19 newsletter, Pru Harrison asked about how to make sure commas following italicized text are not themselves italicized. The newsletter included one solution, but reader Jeanne Pinault sent an additional one:

"Replace [any letter] [comma] [italic] with [^&,] [not italic] and you get two not italic commas for every italic comma you started with. Then replace the double commas with single commas and run through and fix the relatively few that need to stay italic. I got a bunch of tildes when I tried it your way, but I have Windows XP, and it won't let FileCleaner replace hyphens in number ranges in live notes, where I need it most, either. (My cure for that is to replace all the hyphens with en dashes and go back and fix the few places that need hyphens.)"

Many thanks to Jeanne!

_________________________________________

RESOURCES

Speaking of wildcard searching, if you haven't yet downloaded and read my free paper "Advanced Find and Replace in Microsoft Word," you owe it to yourself to do so. The techniques it explains will save you from having to make thousands of tedious, repetitive changes by hand. Understanding these techniques is, in my opinion, the most important thing you can do if you want to work more efficiently in Microsoft Word. You can download the paper by clicking here:

http://www.editorium.com/ftp/advancedfind.zip