Editing by Concordance

Our previous newsletter mentioned our WordCounter program, which can now tell you how many times each word has been used in a document--and I promised to show you how that might be useful for editing. The newsletter also featured a macro that will create a concordance, or list of all words used, from a Word document. Next week, I'll explain a very sneaky way to use that in editing, so stay tuned.

Let's say you've run WordCounter's concordance feature on a document, including word frequency, so you've now got a report in a table that looks like this:

1,639 and

1,453 the

1,330 of

Notice that the table is sorted by word frequency, with the most frequently used words at the top. That doesn't seem very useful; who cares how many times "and" and "of" appear? On the other hand, it may give you an idea of your author's general verbosity and other faults. Lots of prepositions? As you edit, watch for strings of prepositional phrases. Lots of "is," "was," and "were"? The author's verbs may need strengthening, and you may need to root out the passive voice. Lots of capitalized "And" and "But"? Does it bother you to start a sentence with a conjunction? If not, has the author simply overdone it? How many times is "very" used? Fifty occurrences of "paradigm"? Good grief!

What else can you think of? Please let me know; I'll share your thoughts in the next newsletter: mailto:editor [at symbol] editorium.com

Now let's go to the bottom of the table:

3 manger

2 managment

Hmmm. In this business book, we've got "managment" appearing twice, and "manger" three times. The spell checker would have caught "managment" but not "manger." We now know that we should search for "manger" and replace it with "manager." And we might as well take care of "managment" while we're at it. You'll probably find some pretty strange fish in this end of the net, but without WordCounter, they might have gotten away. Find and replace as needed. If you have lots of them, I recommend fixing them en masse with MegaReplacer:

http://www.editorium.com/14843.htm

Now let's sort the table alphabetically by word. No, no, wait. First, select all those frequently used words at the top of the table and delete them. That will get them out of the way for what we want to do next. Here's how:

1. Select a whole bunch of words and numbers you want to get rid of.

2. Click Table > Select > Row.

3. Click Table > Delete > Rows.

Okay, *now* let's sort the table alphabetically by word:

1. Put your cursor in the table.

2. Click Table > Select > Table.

3. Click Table > Sort > Column 2, Text, Ascending.

4. Click OK.

Excellent. Now start looking through your list. What do you see? Multiple spellings for "realize/realise"? How about "President" and "president"? Sorting the table by word puts such variations near each other in the list so you can spot them easily. Then, in your main document, you can find and replace as needed.

Knowing how many times each word appears may also help in your decisions about editorial style. If both styles are acceptable, why not go with the one you have to fix the fewest number of times? Whatever your decision, using a word frequency list can alert you to editorial problems before you ever start editing, and it can help you achieve the editorial consistency you desire.

You can download WordCounter here:

http://www.editorium.com/counter.htm

_________________________________________

READERS WRITE

Judy Stein wrote:

Eric Fletcher writes, "What is particularly useful about this approach is that you can then later collect all of the flagged items in a single step--either for separate review or for use in a style guide. (This method only works for Word 10+.)"

What's Word 10+? I assume it's something beyond Word 2000, because he goes on to talk about a "Highlight all items found in" box--but I don't have one of those.

I replied:

Word 10 is the same as Word 2002 is the same as Word XP. "Word 10+" means Word 10 and anything higher, which is currently Word 11, comprising Word 2003 (PC) and Word 2004 (Mac). Back in the good old days, Word was numbered with, well, numbers rather than years. So we had Word 2, Word 5, and Word 6. With Word 95, however, Microsoft decided to get fancy, but a lot of folks still referred to it as Word 7. Word 97 (and 98) is thus Word 8, Word 2000 (and 2001) is Word 9, and so on.

Meg Cox wrote:

Thanks Eric Fletcher. That's some good stuff that I will wade through when I have the mental energy (it's very complicated!).

Meantime, I have solved my problem of viewing style sheet items in alphabetical order so I can spot near misses as I go along without having to scroll to the proper place each time to insert the new item. (I believe Eric's method would have this happening at the end of the chapter or project rather than all along.)

I also index books, so I have SKY indexing software. I knew this software would solve my problem, but I was stuck because every time I tried to shrink its window so I could tuck it in a corner of my screen, I would get an error message. Well, I decided to just shrink the window bit by bit, ignoring the recurring, and, as it turns out, benign, error message, until I had a nice compact little window to stick in the corner. Now the windows are sharing space nicely.

Now I can type or paste new entries in and immediately see them in context alphabetically next to other entries of the same category--personal name, foreign term, whatever. If I'm typing instead of pasting, autocomplete will let me know right away that the term has been encountered already (perhaps in a previous file if I'm using the color-coding method).

Now I'm wondering: I don't think a Word macro can open a window in another program and order a paste there. That would be very helpful.

If you know of a solution for Meg, please sent it to hints [at symbol] editorium.com.

Pat LaCosse wrote:

As an editor I use VBA to script and extend Word nearly every day. I'm delighted to have found your newsletter.

In "Numbers by Chicago, Part 2" [June 9, 2004], you provided a link to two scripts one might use to eliminate duplicates in a list. Although I'm not too familiar with WordBasic commands, I noticed that your examples were able to handle only duplicates that are adjacent to one another in the list. No problem if you've sorted the list, but what if sorting the list is not necessary or desirable? (There are times, for example, when preserving the order of occurrence is desirable.)

I thought I'd share a technique I've grown to prefer, which eliminates duplicates no matter where they are found in the list. It utilizes VB's dictionary object and it is fast. I've run scripts similar to the one below on files that are 11 MB big, and the difference in speed as a result of using the dictionary object (as opposed to recursively iterating through each paragraph) is remarkable. The dictionary object's comparemode property provides a convenient way for the filtering to be case sensitive if need be. One can read more about the dictionary object's properties and methods in Word's VBA help file. I should mention that I've used the dictionary object only on Windows machines running Word 2000 and 2002. I don't know how available the dictionary object is for other platforms and versions, but those who have access to it will find it quite useful for a variety situations. I use it to create concordances, audit documents for special characters, etc. all the time.

Here is an example with comments. Normally I try to be much more modular in my programming. For example, I would usually put the core functionality here into a sub or function to which I could pass a range object (allowing me to pass it the range of an entire document or merely that of a selection within a document). And I'd make the comparemode an optional argument to pass. Because the purpose here is simply to show the dictionary object in action, I've adapted some code to be a situation-specific script, which allows it to be tested easily on a document. With that disclaimer, here it is:


Sub ListEliminateDuplicates()
'Pat LaCosse
'Adapted from my ConcordanceTools template
'and submitted to the Editorium newsletter
'on June 17, 2004.
Dim para As Paragraph
Dim dict
'Create an instance of the dictionary object
Set dict = CreateObject("Scripting.Dictionary")
'Set comparemode; use vbBinaryCompare
'for case-sensitive filtering
dict.comparemode = vbTextCompare
'Iterate through all the paragraphs in the doc.
For Each para In ActiveDocument.Paragraphs
'If we've already encountered this item,
'then delete the paragraph.
If dict.Exists(para.Range.Text) Then
para.Range.Delete
Else
'If we haven't already encountered this item,
'then add it to the dictionary's keys.
dict.Add para.Range.Text, ""
End If
Next para
Set dict = Nothing
MsgBox "Done!"
End Sub

If you don't know how to use such macros, you can find out here.

Linda DeVore and Leo Wong wrote to say that the lines in last week's DeleteDuplicates macro broke incorrectly in their email and so wouldn't run correctly. Here's a version in which the lines are shorter, which should solve the problem:


Sub MakeCordance()
'Courtesy of the Editorium
'http://www.editorium.com
'Mark an index entry for each word in the document:
Dim myWord
For Each myWord In ActiveDocument.Words
ActiveDocument.Indexes.MarkEntry _
Range:=Selection.Range, Entry:=myWord
Next myWord
'Go to the end of the document:
Selection.EndKey Unit:=wdStory
'Mark place with a bookmark:
ActiveDocument.Bookmarks.Add _
Range:=Selection.Range, Name:="IndexStartsHere"
'Generate an index based on the entries marked earlier:
With ActiveDocument
.Indexes.Add Range:=Selection.Range, _
HeadingSeparator:=wdHeadingSeparatorNone, _
Type:=wdIndexIndent, RightAlignPageNumbers:= _
False, NumberOfColumns:=1, _
IndexLanguage:=wdEnglishUS
.Indexes(1).TabLeader = wdTabLeaderDots
End With
'Go back to the bookmark:
Selection.GoTo What:=wdGoToBookmark, _
Name:="IndexStartsHere"
'Select the index, from the bookmark
'to the end of the document:
Selection.EndKey Unit:=wdStory, Extend:=wdExtend
'Turn the index "field" into actual text:
Selection.Fields.Unlink
'Get rid of the page numbers after the index entries:
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = ", [0-9]@[^013]"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
'Go back to the bookmark:
Selection.GoTo What:=wdGoToBookmark, _
Name:="IndexStartsHere"
End Sub

Many thanks to Judy, Meg, Pat, Linda, and Leo for their excellent tips and comments.

_________________________________________

RESOURCES

If you want to get very serious about concordance software, you might want to look at the explanations and resources here:

http://www.uni-giessen.de/~ga1007/ComputerLab/concordance.htm

Making a Concordance

Have you ever needed to make a list of every word in a document? If so, here's a macro that will do it for you automatically. Basically, the macro marks an index entry for every word in your document, generates the index, and removes the page numbers, leaving you with an alphabetical list of words used (at the end of the document). It's sometimes interesting to see what Microsoft Word considers a "word"; periods, commas, and other unlikely items will be included.

To use the macro, open a document for which you need to make a concordance. (Be sure to keep a backup, just in case.) Then, run the following macro on the document (I've included comments to explain how it works):


Sub MakeConcordance()
'Courtesy of the Editorium
'http://www.editorium.com
'Mark an index entry for each word in the document:
Dim myWord
For Each myWord In ActiveDocument.Words
ActiveDocument.Indexes.MarkEntry Range:=Selection.Range, Entry:=myWord
Next myWord
'Go to the end of the document:
Selection.EndKey Unit:=wdStory
'Mark place with a bookmark:
ActiveDocument.Bookmarks.Add Range:=Selection.Range, Name:="IndexStartsHere"
'Generate an index based on the entries marked earlier:
With ActiveDocument
.Indexes.Add Range:=Selection.Range, HeadingSeparator:= _
wdHeadingSeparatorNone, Type:=wdIndexIndent, RightAlignPageNumbers:= _
False, NumberOfColumns:=1, IndexLanguage:=wdEnglishUS
.Indexes(1).TabLeader = wdTabLeaderDots
End With
'Go back to the bookmark:
Selection.GoTo What:=wdGoToBookmark, Name:="IndexStartsHere"
'Select the index, from the bookmark to the end of the document:
Selection.EndKey Unit:=wdStory, Extend:=wdExtend
'Turn the index "field" into actual text:
Selection.Fields.Unlink
'Get rid of the page numbers after the index entries:
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = ", [0-9]@[^013]"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
'Go back to the bookmark:
Selection.GoTo What:=wdGoToBookmark, Name:="IndexStartsHere"
End Sub

If you don't know how to use such macros, you can find out here.

If you want to keep the page numbers, just leave out these lines:

'Get rid of the page numbers after the index entries:

Selection.Find.ClearFormatting

Selection.Find.Replacement.ClearFormatting

With Selection.Find

.Text = ", [0-9]@[^013]"

.Replacement.Text = "^p"

.Forward = True

.Wrap = wdFindContinue

.Format = False

.MatchCase = False

.MatchWholeWord = False

.MatchAllWordForms = False

.MatchSoundsLike = False

.MatchWildcards = True

End With

Selection.Find.Execute Replace:=wdReplaceAll

For certain kinds of projects (catalogs, for example), you may be able to use a concordance to create an index of your document. You can learn more here:

http://lists.topica.com/lists/editorium/read/message.html?mid=1714146574

Need to create a concordance for a whole bunch of documents at once? Use our MultiMacro program to run the macro above on all documents in a folder:

http://www.editorium.com/14844.htm

Or, you could use our WordCounter program, which now includes a concordance feature with a frequency count of the words in a document or documents. (This is a free upgrade for registered users.)

http://www.editorium.com/counter.htm

In other words, WordCounter can now tell you *how many times* each word has been used. How might that be useful for editing? Stay tuned. Over the next few weeks, I'll reveal all.

_________________________________________

READERS WRITE

Neman Syed wrote:

A colleague of mine recently asked me if there was a keyboard shortcut to modify styles (I love keyboard shortcuts 🙂 and I showed him how to assign shortcut keys to Word commands and deal with the Task Pane. Office XP has very poor native keyboard alternatives for the Task Pane, to the point I hardly ever use them, with just a couple of exceptions:

* F6 toggles between the document and the Task Pane. You can then use up/down/tab/shift-tab/enter/alt-down (to open drop-down lists), etc.

* CTRL-TAB cycles through the Task Pane, toolbars, and menu, when you're in one of the Task Pane, toolbars, or menu. If you're in the document, it puts a tab.

Realizing that making these shortcuts may be useful for others (and triggered by Pamela's observation in the 2004-05-12 broadcast that one person's obvious is another person's stunner 🙂 here's my method for solving "Joe's Problem": Assign the keyboard shortcut of ALT-M to the FormatStyleModify command.

Here's how:

1. Tools > Customize.

2. Commands > Keyboard.

3. From the Format category on the left, choose FormatStyleModify on the right.

4. Assign ALT-M or whatever keyboard shortcut you want in the "Press new shortcut key" box. If you choose something that already has an assignation, you'll see it noted in an unobtrusive manner below.

********************************

4a. If desired, change the template from Normal.dot to whatever document/template you want this to apply to. Obviously this selection influences what machines this keyboard shortcut is available on; in Windows 2000 and above, this may only be for the current user. In my case I keep all my modifications, macros, etc. in a file called custom.dot which I automatically throw into Word's Startup folder whenever I'm on a new machine. It's portable and powerful. Note: Only open documents and their templates are eligible candidates here, so to use my personal approach you'll need to manually open whatever startup template/add-in you use to store all your customizations.

********************************

5. OK your way back to your doc.

Your ALT-M shortcut now modifies whatever style your cursor is sitting in. (Not surprisingly, character takes precedence over paragraph.) It certainly saves me time and trouble, and I hope it will for your readers, too!

Many thanks to Neman for this terrific tip.

_________________________________________

RESOURCES

Not to toot my own horn, but I was recently looking through the back issues of this newsletter, available here:

http://www.editorium.com/euindex.htm

Wow, there's an awful lot of good information there! If you didn't know about this archive, now you do. I hope you find it useful.

Numbers by Chicago, Part 2

Our previous article outlined a fairly lengthy Find and Replace routine to make sure inclusive (elided) numbers follow the style outlined in the Chicago Manual. Astute reader Andrew Lockton responded with a technique that is so important, it deserves a second article. Andrew suggested taking the "Find What Expression" wildcard, which takes the form 1, 2, and so on, and putting it not in the Replace With box, where it is ordinarily used, but in the *Find What* box--something I did not know was possible. Hats off to you, Andrew.

Instructions for using the "Find What Expression" wildcard can be found here:

http://lists.topica.com/lists/editorium/read/message.html?mid=1706365638

Andrew's discovery opens up all kinds of possibilities for various problems I've previously been unable to solve, but let's look specifically at getting numbers by Chicago. The previous method required 18 separate searches. Andrew's brilliant methodology requires only three. Here's the explanation:

1. Numbers that take the form 104-105 need to be converted to 104-5:

Find What:

([1-9])0([1-9])-10([1-9])

Replace With:

102-3

What's going on there is that the first number grouping, ([1-9]), is being referred to by the 1 that follows the hyphen--in the Find What string. See it? Just before the 0 there? That tells Word to find (again) whatever was found by the first number grouping. For example, when the search hits something like "203-205," it says, "Hey, my first number group finds 2 [the first number in 203]. Let's see, is there also a 2 after the hyphen? Yes, there is!" Slicker than snake shoes, as expert word whacker Hilary Powers is fond of saying.

2. Numbers that take the form 104-110 need to be converted to 104-10:

Find What:

([1-9])0([1-9])-1([1-9])([0-9])

Replace With:

102-34

3. Numbers that take the form 111-112 or 119-120 need to be converted to 111-12 or 119-20:

Find What:

([1-9])([1-9])([0-9])-1([1-9])([0-9])

Replace With:

123-45

At first I thought it might be possible to combine 2 and 3:

([1-9])([0-9])([0-9])-1([1-9])([0-9])

But that would also find even hundreds (100, 200), which need to be ignored (100-114 rather than 100-14).

Many thanks to Andrew for the terrific tip.

_________________________________________

READERS WRITE

Alan Seiden wrote:

Here's another caps lock fixer. The program is called AntiCapsLock. It is free to try, but costs $10 to have the program remember one's settings.

http://www.orionsoft.cz/anticapslock.asp

We've set it up so that caps lock only toggles on or off when SHIFT is pressed along with the CAPS LOCK key. It works very well for a fussy computer user.

----------------------------

After reading the article "Numbers by Chicago," Jeanne Pinault wrote:

What I do with elided numbers is just replace all the hyphens with en dashes and then fix whatever comes up wrong when I edit the notes. That's because every set of endnotes I see is wrong in a slightly different way from every other set of endnotes I ever saw, so I have to read every character anyway. I can see that your marvelous find and replace would be a godsend with consistently formatted and voluminous endnotes produced on a regular basis, though. Are en dashes in there someplace?

I responded:

In the Find string, use ^150 (the en-dash code) instead of the hyphen.

----------------------------

Margaret Berson wrote:

I just was looking at your sequential replacement operation for page numbers. Why would you not use a macro that would go through and use the string position functions to evaluate the first digit of the first page number against the first digit of the second page number, deleting the unneeded first digit of the second number if it's the same, and leaving it alone if it's higher?

Wordmeister Steve Hudson sent in a macro that takes things even further. He wrote:

The following solution was designed to not just satisfy the English world with its 0-9 numerics. Use it to reduce hexadecimal addresses, Japanese, or anything. Even if the numbers aren't sequential, like hex, we just use ranges for the find such as "0-9A-F".

It is as simple as possible whilst being as generic as possible. Simpler solutions cannot work for non-English solutions as we cannot guarantee ASCII status. It is fully commented and written for clarity and education rather than speed. It will still run like greased lightning but 🙂


Public Sub NumberCruncher()
'Link this one to your toolbar
'Change any parms as needed from here
NumberCrunch ActiveDocument.Content
End Sub
Public Function NumberCrunch( _
Scope As Range, _
Optional NumberSeparator As String = "-", _
Optional Numbers As String = "0-9" _
) As String
'Another document solution from WordHeretic.com
'Produces short form number ranges anywhere in the provided
'document range. Eg 309-310 into 309-10 and 307-308 into 307-8
'You can use Unicode nnnn by using "^nnnn"
'NumberRange and architecture is for true I18N
'Known Issues: n-n will end up being n-. Eg 300-300 to 300-
'___________________
'Declare
'___________________
Const EndOfWord As String = ">"
Dim NumberRange As Range
Dim FirstNumber As Range
Dim SecondNumber As Range
Dim Separator As Range
Dim AnyNumber As String
Dim LenFirst As Long
Dim LenSecond As Long
'___________________
'Initialise
'___________________
Set NumberRange = Scope.Duplicate
Set FirstNumber = Scope.Duplicate
Set SecondNumber = Scope.Duplicate
'___________________
'Clarity
'___________________
AnyNumber = "[" & Numbers & "]@"
With NumberRange.Find
.Text = AnyNumber & NumberSeparator & AnyNumber & EndOfWord
.MatchWildcards = True
End With
'___________________
'Main program loop
'___________________
While NumberRange.Find.Execute(Replace:=wdReplaceNone)
Set Separator = NumberRange.Duplicate
With Separator.Find
.Text = NumberSeparator
.Execute(Replace:=wdReplaceNone)
End With
'So now we have the entire number range AND
'the separator range, we can calc the numbers
FirstNumber.Start = NumberRange.Start
FirstNumber.End = Separator.Start
SecondNumber.Start = Separator.End
SecondNumber.End = NumberRange.End
'Counting chars is NOT the same as an offset
LenFirst = FirstNumber.Characters.Count
LenSecond = SecondNumber.Characters.Count
'Now lets work out what's the same
'First up, if the second number is shorter than
'the first, it's already been done or is irrelevant.
'Eg 200-7
'If the second number is longer we cannot find common ground
'Eg 97-101
'Thus, we can ONLY operate on equal length numbers.
'Then, test for the number being a dynamic field
'as we can't really change those
If LenFirst = LenSecond And NumberRange.Fields.Count = 0 Then
'Now we need to match every character or finish
'We will shrink our FirstNumber range as we go,
'and delete the secondnumber range as we go
'Char comparisons DO use unicode
While FirstNumber.Characters(1) = SecondNumber.Characters(1)
FirstNumber.MoveStart
SecondNumber.Characters(1).Delete
Wend
End If
Wend
'___________________
'Destroy all objects
'___________________
Set FirstNumber = Nothing
Set SecondNumber = Nothing
Set Separator = Nothing
Set NumberRange = Nothing
End Function
Steve later added:
We may also want to include something like this if the user wants to run the macro on a range of text:
Public Sub NumberCruncherSelection()
NumberCrunch Selection.Range
End Sub

If you don't know how to use macros like that one, you can find out here:

----------------------------

The consistently brilliant Eric Fletcher wrote:

I was interested to see the tip from Meg Cox and Joy Freeman in the last Editorium posting about highlighting all instances of an item. In a job some time ago, some very foreign names were being used throughout. I knew they would cause problems later in the spell check but unless I was careful, a slightly different spelling of the same name would easily slip past. For example, "Mkandawire" might also be "Mkandewire"... I wanted to avoid the tedium of clicking the Ignore button during spell check but still have a way to check the items.

So, in order to both flag a word as already seen and turn off proofing, I created the little macro below. To use it, I select the word (or words) and click the button associated with it. All identical instances (note the MatchCase) are set in green color with no proofing. The resultant green color shows that the word has already been encountered (as noted in your reader's tip).

However, what is particularly useful about this approach is that you can then later collect all of the flagged items in a single step -- either for separate review or for use in a style guide. (This method only works for Word 10+.)

1. In the Find box, leave Find What empty but use Format to select the color (Green in my case).

2. Click the "Highlight all items found in:" box and choose Main Document. The Find button changes to Find All, and when you click it, all instances of the color green will be highlighted.

3. Now for the fun part: close the F&R dialog and choose Copy (Ctrl-C); open a new document and paste (Ctrl-V).

What you get is a list with each found item on a line of its own. You can then sort it and more easily review the list since all identical instances of the same item sort together. (...and I'm sure someone out there will even have a VBA script that could eliminate all duplicates in the sorted list!)

[Editor's note: You'll find such a script here: http://lists.topica.com/lists/editorium/read/message.html?mid=1702467672]

Here's my macro:


Sub FlagThis()
' FlagThis Macro
' Flags current selection as green with no proofing throughout the
document. E Fletcher 2003-10-23
'
Dim flagit As String
flagit = Selection
Selection.MoveLeft Unit:=wdCharacter, Count:=1
With ActiveDocument.Content.Find
.ClearFormatting
.Text = flagit
.MatchCase = True
With .Replacement
.Text = "^&"
.ClearFormatting
'-- colour and no proofing options for replace
.Font.Color = wdColorGreen
.NoProofing = True
End With
.Execute Format:=True, Replace:=wdReplaceAll
End With
Selection.MoveRight Unit:=wdWord, Count:=1
End Sub

Note that I have it set up so the cursor ends up at the end of the first word in the selection. If users want to just add color and not set the proofing off, the ".NoProofing = True" statement should be removed.

I also use a slightly modified version of this method to flag words set in a different language. My Quebec flag button sets the selection in my custom "French" character style [French (Canada) language and font color blue] so I modified the FlagThis macro to set all instances of the selection to the French style. The spell check switches languages on the fly so it checks properly in multiple languages. Then, before I print or release the final version of the file, I modify the style definition(s) to change the language color(s) to automatic.

Many thanks to Alan, Jeanne, Margaret, Steve, and Eric for their terric tips and comments.

Numbers by Chicago

I recently worked on a manuscript with lots of source citations, many of which had page numbers formatted like this:

122-123

I prefer the shorter style recommended in the Chicago Manual of Style (8.69):

122-23

And besides, the manuscript was inconsistent, sometimes using one style, sometimes the other. Not wanting to fix all of these by hand, I decided to put the old wildcard search to work. You can learn about searching with wildcards in my free paper "Advanced Find and Replace in Microsoft Word":

http://www.editorium.com/ftp/advancedfind.zip

The first thing I needed to do was simplify things. Consider the style for even hundreds:

100-109

100-119

100-201

In all such cases, the numbers were already in the correct style, so I decided to just get them out of the way, like this:

Find What:

00-

Replace With:

~~-

(Those tildes are just arbitrary placeholders to be turned back to zeroes later.)

With that taken care of, I originally thought I could change all the other numbers like this:

Find What:

([0-9]{3}-)[0-9]([0-9]{2})

Replace With:

12

That "Find What" string finds any set of three {3} numbers [0-9] followed by a hyphen, followed by a single number [0-9], followed by any set of two {2} numbers [0-9]. The items in parentheses are treated as as a group.

The "Replace With" string replaces the first 1 parenthetical group with itself and the second 2 parenthetical group with itself, leaving out any number [0-9] that was not grouped in parentheses.

That will definitely change 122-123 to 122-23, but it will also change 308-309 to 308-09, so we'll need to get a little fancier. How about this?

Find What:

([0-9]{3}-)[0-9]([1-9]{2})

Replace With:

12

Notice that I've changed that last number range to [1-9] rather than [0-9]. That means numbers like 308-309 will not be found but numbers like 308-319 will. (Come to think of it, that single number in the middle could probably be [1-9] as well, since there shouldn't be any page numbers like 308-019. Of course, you never know.) Now, does that solve the problem?

Well, no. We still need to deal with numbers like this:

398-415

We certainly don't want that changing to 398-15. And what about this?

247-517

Unlikely, I'll admit, but still possible.

And that means we can't do our find and replace all in one shot. Instead, we'll have to do 18 specific searches:

(1[0-9]{2}-)1([1-9][0-9])

(2[0-9]{2}-)2([1-9][0-9])

(3[0-9]{2}-)3([1-9][0-9])

(4[0-9]{2}-)4([1-9][0-9])

(5[0-9]{2}-)5([1-9][0-9])

(6[0-9]{2}-)6([1-9][0-9])

(7[0-9]{2}-)7([1-9][0-9])

(8[0-9]{2}-)8([1-9][0-9])

(9[0-9]{2}-)9([1-9][0-9])

(10[1-9]-)10([1-9])

(20[1-9]-)20([1-9])

(30[1-9]-)30([1-9])

(40[1-9]-)40([1-9])

(50[1-9]-)50([1-9])

(60[1-9]-)60([1-9])

(70[1-9]-)70([1-9])

(80[1-9]-)80([1-9])

(90[1-9]-)90([1-9])

At least that's how it looks to me. If you have a better way, I'd love to hear about it.

You can do the searches by hand if you like. You've got 20 chapters, all in separate files? Let's see--20 x 18 = 360 separate searches. Ouch! Of course, you could use my MegaReplacer program to do them all at once, freeing up your time for something more interesting:

http://www.editorium.com/14843.htm

Don't forget, we still need to turn those tildes back into zeroes:

Find What:

~~

Replace With:

00

Now all of those page numbers should be in Chicago style. How beautiful!

"What about four-digit numbers?" you ask. I leave it as an exercise for you to work out.

If you'd like this whole thing ready to run in MegaReplacer, here it is:

00-|~~-

(1[0-9]{2}-)1([1-9][0-9])|12+m

(2[0-9]{2}-)2([1-9][0-9])|12+m

(3[0-9]{2}-)3([1-9][0-9])|12+m

(4[0-9]{2}-)4([1-9][0-9])|12+m

(5[0-9]{2}-)5([1-9][0-9])|12+m

(6[0-9]{2}-)6([1-9][0-9])|12+m

(7[0-9]{2}-)7([1-9][0-9])|12+m

(8[0-9]{2}-)8([1-9][0-9])|12+m

(9[0-9]{2}-)9([1-9][0-9])|12+m

(10[1-9]-)10([1-9])|12+m

(20[1-9]-)20([1-9])|12+m

(30[1-9]-)30([1-9])|12+m

(40[1-9]-)40([1-9])|12+m

(50[1-9]-)50([1-9])|12+m

(60[1-9]-)60([1-9])|12+m

(70[1-9]-)70([1-9])|12+m

(80[1-9]-)80([1-9])|12+m

(90[1-9]-)90([1-9])|12+m

~~|00

_________________________________________

READERS WRITE

Mary Russell wrote:

I'm working on a revision of an encyclopedia on world religions that already has a 108-page word list and a 1,000-page index of terms I need to check *everything* against. I'm using your style sheet macro to slap each term I want to check into the style sheet as I go and then doing a separate pass to check them all--and having them alphabetized saves me a lot of scrolling around in those files. By the way, I *love* your macro. I run the style sheet minimized and don't even have to switch back to my original document. You should really be selling this one. I'm usually more restrained, but this really is a great idea.

--------------------------------

Meg Cox wrote:

Joy Freeman on Freelance suggested a new approach that I think will alleviate the style sheet challenge considerably. I haven't tried it yet, but I'm going to on the next chapter I start. She gave her permission to repeat the approach here:

With each occurrence of a new name, search for the same and replace it with itself in a different color (say, blue). Then you know you've already encountered it and don't need to check it against the style sheet. That way you only have to take action with variations and first occurrences. If it's blue, move on through!

I suspect this approach will come in very handy the next time I have a manuscript with hundreds of unfamiliar personal, place, and organizational names, and it will help in simpler projects as well.

Another way it will help: Sometimes in a long chapter it's hard to remember whether the full name of a person or organization has appeared yet (my clients routinely ask for full version on first occurrence in each chapter, then shortened version thereafter). If the changing to blue is done chapter by chapter (and I think it could be handled quickly--I need to think macro on this), blue will mean the full version has already occurred and an abbreviation or last-name-only may be called for. Could be useful for long sets of notes too so I know when it's time to go with a short citation! (Lately I'm seeing plenty of chapters with 70 or more notes.) Oh, and good for parenthetical citations too, so I know what I've already checked against bibliography.

[Editor's note: Our RazzmaTag program would be very useful for this kind of thing: http://www.editorium.com/razzmatag.htm.]

--------------------------------

Brad Hurley wrote:

Steve Hudson wrote:

I'd like to advise you and your readers to avoid Outlook

2003. It has more bugs than the NSW locust plague here in

Australia at the moment. I could fill an article with simple

features that cause immediate failures.

This is almost the opposite of my experience. I've been using Outlook 2003 daily on my Windows 2000 machine since last October, and it has never crashed. I have encountered several bugs and design flaws, but overall my experience has been positive. The much-improved spam filtering and the new three-pane design make it a far better program than previous versions. The upgrade from Outlook 2000 went flawlessly and it handles my large e-mail archive files (300-600 megabytes each) without a complaint. The only serious problems I've noticed so far are:

1. Editing a message in your outbox makes it impossible to send; you have to transfer it to a different folder and send it from there.

2. E-mail address auto-complete doesn't work if your contact's address book entry also has a fax number listed (this is a very frustrating flaw because Outlook should be smart enough to know you're not trying to send a fax when you've composed an e-mail message to someone).

3. Hitting the return button to start a search only works once a session; after that you have to use your mouse to click the "find now" button.

Other than that, I'm satisfied with Outlook 2003; in fact it's the only element in the Office suite that I've found worth upgrading from the 2000 versions.

Many thanks to Mary, Meg, and Brad for their helpful tips and comments.

_________________________________________

RESOURCES

Bruce Koehler wrote:

Another approach to handling accidental press of the Insert key and other keys (e.g. Caps Lock) is a small freeware program called FirstCap. It allows you to set up these potentially problematic keys in various ways such as:

* Disable

* Disable--but with a work-around to re-enable once or continuously sound an alert when pressed

Terrific little program. Just Google search on "FirstCap" for many sites that offer it.

And while I'm at it, here are a few more that would be useful to Word users:

Memokeys: "MemoKeys can help you to fill forms faster, to execute repetitive tasks without having to type every time the same text or keystrokes. The principle is simple: MemoKeys creates associations between key combinations on your keyboard and some predetermined texts or system actions . . . " Uses a Function key (F12, for example) plus an alphanumeric key. Works in all programs.

MinMax extender: adds icons by the "-" and "X" icons at the upper right of a window to allow rollup, expand window to full screen width or height (and undo), etc.

FileEx (shareware--not free): can expand most dialog boxes--a great help in Word--also allows a different default Save destination (great when you're opening a lot of files in one folder but want to save them to another location).

Many thanks to Bruce for suggesting these programs. Bruce also suggested asking newsletter readers if they know of other editing or Word-related shareware and freeware. If you do, please let us know, and we'll list them in the next newsletter: mailto:resources [at symbol] editorium.com