Using Backlinks in ht://Dig

by Geoff Hutchison Copyright © 1999 Geoffrey Hutchison 12 Jun 1999

Recent releases of ht://Dig have relied on some slightly confusing techniques to enhance result rankings. In particular, the use of backlink weighting has (in my opinion) produced a significant improvement in scoring. These improvements are based on research used in the search engine Google as well as the Clever Project at IBM.

Put simply, most search engines do not consider the hyperlinks themselves in ranking. Documents do not only have links going to other documents, but backlinks of documents referring to them. Much as traditional bibliographies tell much about how important a document is, backlinks can describe how important a document is. Put simply, useful documents tend to have many links pointing to them, and documents on one subject tend to link to useful documents on that same subject.

However, systems such as Clever and Google perform very sophisticated (and CPU-intensive) analysis of these backlinks. The technique used currently by ht://Dig is to make a general assumption about the relative numbers of backlinks, rather than weight the entire link network (Google) or consider the importance of documents pointing to the current document (Clever). Instead, it is assumed that useful documents, in general, have more in-links (backlinks) than out-links. For example, a mailing list archive has an index page with many backlinks (one from each archived message) but also has a link to each of those messages. However, a page on a particular topic is likely to have many backlinks from related pages, but often has very few links of its own.


Last modified: $Date: 2001/01/22 01:21:58 $