by Jack Lyon
I need your help, Gentle Reader. I need your ideas. Back in 1996, when I started selling Microsoft Word add-ins at the Editorium, getting a Word document into QuarkXPress was tricky: Quark was prone to crashes and didn’t handle footnotes at all. To solve these problems, I created QuarkConverter, and NoteStripper. A few years later, when people started switching to InDesign, I created InDesignConverter.
In the past several years, however, both QuarkXPress and InDesign have become much better at importing Word documents directly, without the need for a converter. The crashes are mostly gone, and footnotes come right on in. Nevertheless, I’m wondering what else might be done to a Word document to save time and trouble when importing into a layout program — and I’d greatly appreciate your thoughts about that. Here are some examples of the kind of thing I have in mind:
- Add nonbreaking spaces to dates and initials.
For example, if the text includes a date like “August 17, 2016,” most typesetters want “August” and “17” to stay together; adding a nonbreaking space between the two elements does the trick. Similarly, if a name like “C. S. Lewis” shows up, it’s nice to keep the “C.” and the “S.” together. (To add a nonbreaking space in Word [Windows] 2007 and newer, hold down the CTRL and SHIFT keys as you press the spacebar. For Word [Mac], press the Option key as you press the spacebar.)
- Remove formatting “overrides.”
Typesetters typically want to handle formatting with styles, so that changing a style attribute in InDesign automatically changes formatting throughout the document. If an author or editor has applied styles in a Word document, those styles can be imported and used in InDesign. But if an author or editor has applied direct formatting using various fonts, that formatting will be imported as “overrides” on the text, which can be a bit of a pain to clean up.
In its Styles pane, Microsoft Word offers to “Clear All” formatting and styles from selected text.
The problem is, “Clear All” really does mean “Clear All,” including not just font overrides but also such local formatting as bold and italic, which needs to remain intact. InDesign’s “Clear Overrides” feature has the same problem. Do you really want to remove italic formatting from the hundreds of journal titles in that giant manuscript you’re editing? If you’re proofreading or setting type, do you really want to put all that formatting back in again by hand? My FileCleaner add-in includes an often-overlooked feature (“standardize font formats”) that removes font overrides but leaves bold, italic, and other local formatting intact, which is exactly what’s needed.
- Turn straight quotation marks into curly ones.
InDesign can do this—sort of. But it can’t handle things like “’Twas the night before Christmas” or “A miner, ’49er” (dreadful sorry, Clementine). FileCleaner does a much better job of dealing with this; it properly handles ’til, ’tis, ’tisn’t, ’twas, ’twasn’t, ’twould, ’twouldn’t, and ’em, as well as single quotation marks in front of numbers, all of which then come into InDesign correctly. If you have other items that should be included in this list, I’d love to know what they are.
- Remove multiple spaces between sentences.
In the 1800s many books were set with extra space between sentences.
But, frankly, the 1800s were not exactly the golden age of typesetting.
Modern books include just one space between sentences. Still, many authors continue to use two, following the instructions they were given by their high-school typing teacher back in the twentieth century. And that means the double spaces need to be removed at some point. InDesign has built-in find-and-replace routines that will fix this and a few similar items.
FileCleaner, however, fixes many such things. And the version that’s included with Editor’s ToolKit Plus 2014 fixes many more.
- Change italic and bold formatting to character styles.
Using character styles in InDesign provides much more stability and flexibility than local bold and italic formatting. It would be nice to have these styles already applied in Word before the document is imported into InDesign. My tools don’t currently do this, but they probably should.
QuarkConverter and InDesignConverter include some other useful fixes.
Nevertheless, I can’t help thinking that there must be things I’ve overlooked. I’m an editor, not a typesetter, so I don’t really know all of the things that typesetters have to fix that they really shouldn’t have to deal with. (This probably includes the most common errors that proofreaders mark.) So if you do typesetting or proofreading, would you help me out? I’d really like to know what I’m missing — things that could be cleaned up in an automated way in Microsoft Word before a document is ever imported into InDesign. What problems do you routinely encounter that you wish would go away? If you’ll let me know, I’ll try to come up with an add-in designed specifically to fix such things. Your suggestions for this would be most welcome.
Of course, typesetters and proofreaders aren’t the only ones who can benefit from this kind of cleanup. It’s also valuable to editors, allowing them to focus on words, structure, and meaning rather than deal with these tiny but pervasive problems. Little things like double spaces and straight quotation marks may not seem all that bothersome, but like pebbles in your shoe, they create subliminal annoyance that really adds up, making editing much more difficult than it should be. At least that’s my experience. What do you think?
Jack Lyon (editor@editorium.com) owns and operates the Editorium, which provides macros and information to help editors and publishers do mundane tasks quickly and efficiently. He is the author of Microsoft Word for Publishing Professionals, Wildcard Cookbook for Microsoft Word, and of Macro Cookbook for Microsoft Word. Both books will help you learn more about macros and how to use them.
Jack, it would save more time if the split doc function exported as .docx rather than .doc. Thank you for a great and versatile set of tools.
Thank you! I’ll get this fixed.
Bonnie, it’s taken a lot longer than I anticipated, but in the new Editor’s ToolKit Plus 2018, the Split Documents feature now correctly exports as .docx. But that’s a minor fix. The program includes numerous new features and important upgrades. I hope you like it; your feedback would be welcome. Here’s the scoop:
http://www.editorium.com/ETKPlus2018.htm
Jack, maybe you will think this through to determine what macros Editorium can offer given the following scenario.
Some indy designers are not interested in renting software. We own version 6 of Adobe’s Creative Suite and that’s where we plan to stay. We are happy with that, especially for typesetting print books. However, after the galleys come the revisions, which are done in InDesign. Those files are then exported for printing via an Acrobat file. And then it becomes time to convert to ePub. One way to do this is to make revisions twice (once in Word and once in InDesign.) Another way is to revise in InDesign and then export (or “save as .docx”) the book file from Acrobat to Word and then fix most of the production issues discussed in the following article.
What Agents Should Know About Ebooks Made from PDFs
By: Ben Denckla | August 24, 2016
http://www.digitalbookworld.com/2016/agents-know-ebooks-made-pdfs/
CS6 designers want easier ways to fashion ePubs from Acrobat files saved to Word’s .docx without having to revise twice. For that we need perfectly styled Word .docx files. Editorium 2014 gets us part of the way there, including splitting the chapters. (The latter currently have to be handled twice in that they are .doc files that need to be saved as .docx.)
Of Denckla’s list, the contents table is least important as that is easily done. Handling images is a different story and outside the realm of this request.
Fixing the other items is labor-intensive. They are:
1. Distinguishing hard vs. soft line breaks:
2. Distinguishing hard vs. soft hyphens;
3. Handling footnotes and endnotes;
4. Rejecting headers & footers.
Thanks for giving this some thought, Jack, especially #s 1 and 2.
Wow, Bonnie, thanks for the detailed analysis. This is incredibly helpful. I’ll definitely give this some thought, and I’ll probably have some questions for you as I do that. I hope that’s okay. 🙂
I really hope others will weigh in on this as well.
Here are some of the (more or less common) words I find that start with apostrophes, other than those you mentioned:
‘er (for her)
‘im (for him)
‘cause (for because)
‘bout (for about)
‘round (for around)
‘n’ (for and)
‘twere and ‘tweren’t
‘twill and ‘twon’t
‘tain’t
‘riting and ‘rithmetic
‘fore (for before)
(Naturally all turned the wrong way because I cut and pasted from Word…)
Oh, look, they face the right way after all! 🙂
Bonnie, the new Editor’s ToolKit Plus 2018 addresses the first three of the items you mentioned:
1. Distinguishing hard vs. soft line breaks:
2. Distinguishing hard vs. soft hyphens;
3. Handling footnotes and endnotes;
I’m especially happy about my solutions for items 1 and 2:
The feature “Combine lines improperly broken with hard or soft returns” (in FileCleaner) does an amazing job of putting broken paragraph lines back together with no user intervention.
“Remove spurious hyphens” (also in FileCleaner) gets rid of hard hy-phens (like that one) that shouldn’t be there. The program also includes the ability to use a hy-phen-a-tion ex-cep-tions list in Word.
Of course, NoteStripper (included) has a bunch of tools to deal with notes.
FileCleaner can now delete Word’s native headers and footers, but not those that might show up in the main text of a PDF export, so I still have something to work on.
I hope you’ll give these new features a try. Once again, here’s the link:
http://www.editorium.com/ETKPlus2018.htm
Oh, and it works on Macintosh (Word 2016) as well as Windows (Word 2010, 2013, 2016).
Thanks!
Eliza, I’ve added all of your apostrophed entries (and more) to the new Editor’s ToolKit Plus 2018:
http://www.editorium.com/ETKPlus2018.htm