in Editing

Editing by Concordance

Our previous newsletter mentioned our WordCounter program, which can now tell you how many times each word has been used in a document--and I promised to show you how that might be useful for editing. The newsletter also featured a macro that will create a concordance, or list of all words used, from a Word document. Next week, I'll explain a very sneaky way to use that in editing, so stay tuned.

Let's say you've run WordCounter's concordance feature on a document, including word frequency, so you've now got a report in a table that looks like this:

1,639 and

1,453 the

1,330 of

Notice that the table is sorted by word frequency, with the most frequently used words at the top. That doesn't seem very useful; who cares how many times "and" and "of" appear? On the other hand, it may give you an idea of your author's general verbosity and other faults. Lots of prepositions? As you edit, watch for strings of prepositional phrases. Lots of "is," "was," and "were"? The author's verbs may need strengthening, and you may need to root out the passive voice. Lots of capitalized "And" and "But"? Does it bother you to start a sentence with a conjunction? If not, has the author simply overdone it? How many times is "very" used? Fifty occurrences of "paradigm"? Good grief!

What else can you think of? Please let me know; I'll share your thoughts in the next newsletter: mailto:editor [at symbol] editorium.com

Now let's go to the bottom of the table:

3 manger

2 managment

Hmmm. In this business book, we've got "managment" appearing twice, and "manger" three times. The spell checker would have caught "managment" but not "manger." We now know that we should search for "manger" and replace it with "manager." And we might as well take care of "managment" while we're at it. You'll probably find some pretty strange fish in this end of the net, but without WordCounter, they might have gotten away. Find and replace as needed. If you have lots of them, I recommend fixing them en masse with MegaReplacer:

http://www.editorium.com/14843.htm

Now let's sort the table alphabetically by word. No, no, wait. First, select all those frequently used words at the top of the table and delete them. That will get them out of the way for what we want to do next. Here's how:

1. Select a whole bunch of words and numbers you want to get rid of.

2. Click Table > Select > Row.

3. Click Table > Delete > Rows.

Okay, *now* let's sort the table alphabetically by word:

1. Put your cursor in the table.

2. Click Table > Select > Table.

3. Click Table > Sort > Column 2, Text, Ascending.

4. Click OK.

Excellent. Now start looking through your list. What do you see? Multiple spellings for "realize/realise"? How about "President" and "president"? Sorting the table by word puts such variations near each other in the list so you can spot them easily. Then, in your main document, you can find and replace as needed.

Knowing how many times each word appears may also help in your decisions about editorial style. If both styles are acceptable, why not go with the one you have to fix the fewest number of times? Whatever your decision, using a word frequency list can alert you to editorial problems before you ever start editing, and it can help you achieve the editorial consistency you desire.

You can download WordCounter here:

http://www.editorium.com/counter.htm

_________________________________________

READERS WRITE

Judy Stein wrote:

Eric Fletcher writes, "What is particularly useful about this approach is that you can then later collect all of the flagged items in a single step--either for separate review or for use in a style guide. (This method only works for Word 10+.)"

What's Word 10+? I assume it's something beyond Word 2000, because he goes on to talk about a "Highlight all items found in" box--but I don't have one of those.

I replied:

Word 10 is the same as Word 2002 is the same as Word XP. "Word 10+" means Word 10 and anything higher, which is currently Word 11, comprising Word 2003 (PC) and Word 2004 (Mac). Back in the good old days, Word was numbered with, well, numbers rather than years. So we had Word 2, Word 5, and Word 6. With Word 95, however, Microsoft decided to get fancy, but a lot of folks still referred to it as Word 7. Word 97 (and 98) is thus Word 8, Word 2000 (and 2001) is Word 9, and so on.

Meg Cox wrote:

Thanks Eric Fletcher. That's some good stuff that I will wade through when I have the mental energy (it's very complicated!).

Meantime, I have solved my problem of viewing style sheet items in alphabetical order so I can spot near misses as I go along without having to scroll to the proper place each time to insert the new item. (I believe Eric's method would have this happening at the end of the chapter or project rather than all along.)

I also index books, so I have SKY indexing software. I knew this software would solve my problem, but I was stuck because every time I tried to shrink its window so I could tuck it in a corner of my screen, I would get an error message. Well, I decided to just shrink the window bit by bit, ignoring the recurring, and, as it turns out, benign, error message, until I had a nice compact little window to stick in the corner. Now the windows are sharing space nicely.

Now I can type or paste new entries in and immediately see them in context alphabetically next to other entries of the same category--personal name, foreign term, whatever. If I'm typing instead of pasting, autocomplete will let me know right away that the term has been encountered already (perhaps in a previous file if I'm using the color-coding method).

Now I'm wondering: I don't think a Word macro can open a window in another program and order a paste there. That would be very helpful.

If you know of a solution for Meg, please sent it to hints [at symbol] editorium.com.

Pat LaCosse wrote:

As an editor I use VBA to script and extend Word nearly every day. I'm delighted to have found your newsletter.

In "Numbers by Chicago, Part 2" [June 9, 2004], you provided a link to two scripts one might use to eliminate duplicates in a list. Although I'm not too familiar with WordBasic commands, I noticed that your examples were able to handle only duplicates that are adjacent to one another in the list. No problem if you've sorted the list, but what if sorting the list is not necessary or desirable? (There are times, for example, when preserving the order of occurrence is desirable.)

I thought I'd share a technique I've grown to prefer, which eliminates duplicates no matter where they are found in the list. It utilizes VB's dictionary object and it is fast. I've run scripts similar to the one below on files that are 11 MB big, and the difference in speed as a result of using the dictionary object (as opposed to recursively iterating through each paragraph) is remarkable. The dictionary object's comparemode property provides a convenient way for the filtering to be case sensitive if need be. One can read more about the dictionary object's properties and methods in Word's VBA help file. I should mention that I've used the dictionary object only on Windows machines running Word 2000 and 2002. I don't know how available the dictionary object is for other platforms and versions, but those who have access to it will find it quite useful for a variety situations. I use it to create concordances, audit documents for special characters, etc. all the time.

Here is an example with comments. Normally I try to be much more modular in my programming. For example, I would usually put the core functionality here into a sub or function to which I could pass a range object (allowing me to pass it the range of an entire document or merely that of a selection within a document). And I'd make the comparemode an optional argument to pass. Because the purpose here is simply to show the dictionary object in action, I've adapted some code to be a situation-specific script, which allows it to be tested easily on a document. With that disclaimer, here it is:


Sub ListEliminateDuplicates()
'Pat LaCosse
'Adapted from my ConcordanceTools template
'and submitted to the Editorium newsletter
'on June 17, 2004.
Dim para As Paragraph
Dim dict
'Create an instance of the dictionary object
Set dict = CreateObject("Scripting.Dictionary")
'Set comparemode; use vbBinaryCompare
'for case-sensitive filtering
dict.comparemode = vbTextCompare
'Iterate through all the paragraphs in the doc.
For Each para In ActiveDocument.Paragraphs
'If we've already encountered this item,
'then delete the paragraph.
If dict.Exists(para.Range.Text) Then
para.Range.Delete
Else
'If we haven't already encountered this item,
'then add it to the dictionary's keys.
dict.Add para.Range.Text, ""
End If
Next para
Set dict = Nothing
MsgBox "Done!"
End Sub

If you don't know how to use such macros, you can find out here.

Linda DeVore and Leo Wong wrote to say that the lines in last week's DeleteDuplicates macro broke incorrectly in their email and so wouldn't run correctly. Here's a version in which the lines are shorter, which should solve the problem:


Sub MakeCordance()
'Courtesy of the Editorium
'http://www.editorium.com
'Mark an index entry for each word in the document:
Dim myWord
For Each myWord In ActiveDocument.Words
ActiveDocument.Indexes.MarkEntry _
Range:=Selection.Range, Entry:=myWord
Next myWord
'Go to the end of the document:
Selection.EndKey Unit:=wdStory
'Mark place with a bookmark:
ActiveDocument.Bookmarks.Add _
Range:=Selection.Range, Name:="IndexStartsHere"
'Generate an index based on the entries marked earlier:
With ActiveDocument
.Indexes.Add Range:=Selection.Range, _
HeadingSeparator:=wdHeadingSeparatorNone, _
Type:=wdIndexIndent, RightAlignPageNumbers:= _
False, NumberOfColumns:=1, _
IndexLanguage:=wdEnglishUS
.Indexes(1).TabLeader = wdTabLeaderDots
End With
'Go back to the bookmark:
Selection.GoTo What:=wdGoToBookmark, _
Name:="IndexStartsHere"
'Select the index, from the bookmark
'to the end of the document:
Selection.EndKey Unit:=wdStory, Extend:=wdExtend
'Turn the index "field" into actual text:
Selection.Fields.Unlink
'Get rid of the page numbers after the index entries:
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = ", [0-9]@[^013]"
.Replacement.Text = "^p"
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
'Go back to the bookmark:
Selection.GoTo What:=wdGoToBookmark, _
Name:="IndexStartsHere"
End Sub

Many thanks to Judy, Meg, Pat, Linda, and Leo for their excellent tips and comments.

_________________________________________

RESOURCES

If you want to get very serious about concordance software, you might want to look at the explanations and resources here:

http://www.uni-giessen.de/~ga1007/ComputerLab/concordance.htm