Home
 
  Indexing
 
  Editing
 
  Information
  Architecture
 
  About
 
  Contact
 
  Appendix
 

Site Index  
 
  German version  
 
 
 
 Indexetera    —   Indexing et cetera

 
 
 
Slightly edited version of an article originally written as part of an online course on web site indexing held in 2001 by Broccoli Information Management.
 
 
 
Web Site Indexing — A Reliable Means in a New Environment
 
 
Introduction
The Basics of (Good) Indexing
Why a Site Index?
When to Use a Site Index
How to Create and Maintain a Site Index
Conclusion
 
 
 
Introduction

Web site indexing is a rather new discipline having evolved during the last couple of years. The most common type of indexing in a web environment is creating an index for certain types of web sites. A site index refers to information within a site and should not be confused with link collections to outside a site (often called an “index”).

 
 
The Basics of (Good) Indexing

So what is indexing in general all about? Well, it is safe to say that an index is the most natural form of information retrieval. We all know the good old back-of-the-book index — one can look for a specific word and is referred to exactly those locations where the word is part of the topic.

However, there is a huge difference between a professionally created index and a poor one (not to mention no index at all). A good index has a nice design, useful format, and, above all, a certain degree of vocabulary control to avoid information scattering, amongst other things. Also, a good index avoids so-called passing mentions — index entries referring to words which are not part of the topic itself in a particular chapter, section, or paragraph. It is usually quite frustrating to see a word in the index, only to find out that there is no information about it in the original text.

Imagine a lengthy index with the following entries (numbers are page references):

...
Automobiles, 25, 88, 90
...
Cars, 25, 90, 120–125
...
Trains, 48–50
...
Trucks, 88, 92
...
Vehicles, 20–24, 25, 48–50, 88, 90, 92, 120–125
...

An index user looking for Automobiles would miss quite a lot unless he accidentally would find the Cars entry which is the same concept as Automobiles. If you look under Cars you would miss all references to Trucks — a narrower concept of Cars. Only if you look up the Vehicles entry, you would see all relevant references but in a very inconvenient way: the user would have to check each reference to find something specific she is actually looking for.

A much better index would show the following:

...
Automobiles
     construction of, 120–125
     driving safely, 90
     history of, 25
     see also Trucks
...
Cars see Automobiles
...
Construction, of automobiles, 120–125
...
Driving safely, 90
...
Trains, 48–50
...
Trucks, 88, 92
...
Vehicles
     general overview, 20–24
     see also Automobiles; Trains
...

It should be instantly clear that the user will find a certain topic even if he has thought of an alternative term (Cars see Automobiles). Even similar and narrower topics are shown. This is done by the use of cross-references. Also, if an entry (e.g. Automobiles) has lots of aspects, these can be listed as subentries so as to avoid too many undifferentiated page references.

These indexing basics are valid in every environment — be it a print or web publication.

 
 
Why a Site Index?

As can be seen from the above examples, an index refers to specific information. Thus, a site index is very useful in finding specific items even within the most specific web pages. Reliable specificity is indeed one of the big advantages when comparing indexes with other search options.

A topical overview (often called a “site map” and similar to a table of contents) is the main navigational aid in guiding the user to find general topics or parts of a site. This is fundamental for sites with a good information architecture. An overview, however, typically doesn’t show specific information; hence, it is not the same as an index.

But wouldn’t it be more useful to implement a search engine within one’s site? Unfortunately, many people, including web site designers, don’t understand the shortcomings of search engines. Often, the main point in using search engines seems to be that they have a “cool” look and feel; actually, they are quite weak when it comes to information retrieval.

The problem with searching for a certain string of characters is that you’ll never see sub-aspects, similar, or synonymous terms of the concept you are looking for. It’s like wandering around in fog. One never knows whether or not a search engine would completely find all of the relevant information. On top of that, the information found is often displayed in an ugly and/or confusing way. However, it is possible to tame search engines, e.g. letting them search only in a well-prepared but hidden index; in such a case, a search engine would perform much more effectively.

Even if the usefulness of indexes is recognized, some people think they can create indexes automatically. While this can be done, the resulting indexes are often very lousy indeed. There are companies that claim to have the technology to confront this problem. However, just think of a concept not being mentioned verbatim within a text: it won’t get indexed. The same is true of subentries and related terms: automatic indexing often dismally fails to take these into account. In short: in the foreseeable future artificial intelligence won’t be able to tackle this problem. Human indexers, on the other hand, are able to identify indexable concepts — not just words — as well as relations between concepts — and this is what true indexing is all about!

 
 
When to Use a Site Index

Just because there are so many web sites out there, it doesn’t necessarily mean that each and every site needs an index. Small sites probably don’t need an index though even some of them would benefit from an index as a complementary navigational aid to some sort of topical overview.

The most common sites for which it would be worthwhile to create an index seem to be content-rich medium-sized ones with a high percentage of static web pages. This could include corporate intranets and commercial sites as well. There is certainly an upper limit of web pages a human indexer can handle but, then, even very large books have been indexed as well.

 
 
How to Create and Maintain a Site Index

Site indexes can be done either manually or with the help of dedicated indexing software which supports the human indexer to create, edit, and maintain the index.

Unlike a book, a web site has a non-linear structure. Hence, there are no page numbers available within a site. So how is it possible to refer to specific information?

Web sites may contain many web pages. Opening individual pages is no problem but successfully navigating to specific information within a page depends on how that page is being prepared. In HTML, the answer to this problem are anchor tags inserted within the original text, e.g.

  <a name="histauto">Chapter 2: History of Automobiles</a>

Here the <a>...</a> tags enclose a chapter headline. The name of the anchor is “histauto”. This way a human indexer can identify and label specific parts of specific web pages.

Since there are no page numbers within a web site, the link text becomes the locator. Hence, the aforementioned example would look like this:

...
Automobiles
     construction of
     driving safely
     history of
     see also Trucks
...
Cars see Automobiles
...
Construction, of automobiles
...
Driving safely
...
Trains
...
Trucks, 1, 2
...
Vehicles
     general overview
     see also Automobiles; Trains
...

It is important to distinguish between links to contents pages and links being a cross-reference leading to another entry within the site index itself. Also, multiple locations of the same topic can be numbered (e.g. Trucks).

In order to be able to navigate to, for instance, Chapter 2: History of Automobiles, it is necessary to include the following HTML tags in the respective subentry of Automobiles:

  <a href="automobiles.html#histauto">history of</a>

This creates the underlined link to a page called “automobiles” and its anchor called “histauto”, thus leading to the specific item represented by the particular subentry in the index.

There are powerful dedicated indexing programs available which transform a finished index into an HTML document. There is even special indexing software which further facilitates the compilation, maintenance, and updating of a site index, e.g. by displaying all web pages and anchors so as to be able to create proper index entries and cross-references. The updating feature is especially important, since a site index will never be finished.

 
 
Conclusion

If properly done and maintained, a site index is a wonderful means of information retrieval. Even specific contents can be found instantly while avoiding the shortcomings of search engines, thus effectively enhancing findability. Only human indexers, i.e. professionals with appropriate skills and equipped with the right tools, can provide a proper index. For web site projects, therefore, it is important to recognize the roles of indexers and information architects the same way as those of webmasters, programmers, and graphic designers.

Author

 
 
back to Anhang or Appendix  |  back to Indexing (German) or Indexing (English)
 
Top