Indexing technical documents
This article was first published (with small differences) in the City Information Group Yearbook 2004.
"A document without an index is like a country without a map." [Adapted from an unknown source.]
Finding information
If a document contains the information that a reader needs, but if the reader cannot find that information, the document is useless. Worse than useless, the document causes problems. If I know that some information is not available, I will not waste my time looking for the information. However, if I think that the information is available, and if I cannot find it after a period of unsuccessful searching, I will be frustrated.
In the field of technical writing, typical documents are user guides, reference manuals, and online help. A document that has good structure, and a table of contents that contains clear headings can help a reader to find information. However, good structure and a table of contents are not sufficient. Usually, an index is also necessary. An index organizes information that is scattered through a document. An index supplies search terms that tell the reader the locations of applicable information in the document.
Many small documents do not need an index. Possibly, a quick scan through a document is all that is necessary to find the answer to a question. Some large documents do not need an index, or possibly, they need an index for only part of the document. For example, a telephone directory is a large document that is very easy to use, although it does not contain an index for the primary text. Usually, a small index relates the terms that people use to the terms that are in the text. For example, 'technical author: see technical writer'. To know whether an index is necessary, you need to know how the document will be used, its size, and its structure.
Common misconceptions
Some customers have misconceptions about indexes and why indexes are necessary. Before electronic versions of documents were easily available, technical manuals contained an index. Now, readers can search online help and electronic versions of printed documents for words that appear in the text. Does this mean that an electronic document does not need an index? No, as the following example shows.
All software developers of Windows-based software know that you 'close' a dialog box, 'quit' a program, and 'end' a network connection. Many users do not know the different terms. Therefore, if the document contains text that refers to 'quitting xyz', and if users search for 'closing xyz', they will not find the information.
Because a search returns all instances of a term, possibly, the results are not much practical use, because too many results are available.
A frequent misconception is that a word processor or software for desktop publishing can automatically create an index. Software can create a concordance, which is very different from an index. (In this article, concordance means a list of keywords. A different meaning is a list of keywords with their immediate context.) A concordance is not usually useful in the field of technical documentation. A reader needs to know about important instances of a term, not all instances of that term. Software does not know what is important and what is not important.
Creating an index
Until software can 'understand' the real-world meaning of text, an indexer must supply the intellectual input in creating an index (certainly, indexers use software to prepare indexes). An indexer creates an index in one of two ways:
- An index can be created as a separate document (the historical method)
- An index can be created as part of the document (embedded indexing). Embedded indexing is becoming popular. In the field of technical writing for software, embedded indexing is the primary method.
With the first method, the indexer manually specifies the pages to which a term in the index refers. If the page layout changes, the index must be corrected manually, even if software is used to create the index.
With embedded indexing, the indexer puts a marker (also known as a tag or a code) at each location in the document at which the term is relevant. Software uses the markers to make the index. If text is moved, the markers move with the text, and the index can be made again easily. If text is deleted, the markers are also deleted, and the index can be made again. The index still requires manual checking. For example, possibly, cross-references are not correct after text is deleted.
The future
Indexing a document is an intellectual task that is helped by software. Indexers beware! The software on the market that tries to create indexes is not very good. However, it will improve. Possibly, it will never be as good as a human-created index, but possibly, it will become good enough for many purposes.
See also
Frequently asked questions: an alternative
A publisher's job is to provide a good API for books: you can start with your index.