Ben Pitzer (
Thu, 06 Aug 1998 10:26:07 -0400

At 04:52 AM 8/6/98 -0400, Geoff Hutchison wrote:
>> With setting the title_factor to 10 and the text_factor
>> as well as all heading_factors to 0 we still get things that
>> are between the body tags such as links to other pages
>Well the purpose of text_factor is:
> This is a factor which will be used to multiply the
> weight of words that are not in any special part of a
> document. Setting a factor to 0 will cause normal words
> to be ignored.

It occurs to me that the definition 'not in any special part of the
document' is a tad ambiguous. In other words, would the body be considered
a 'special part' of the document? How about links? One could say that
anything which is between any specific tags is in a special part of the
document, and therefore not subject to the exclusion of the 'text_factor:
0' attribute. According to the documentation I've seen so far, the only
specific tags that htdig will look for are the <title>...</title> and
<h1>...</h1>-<h6>...</h6> tags. Do all tags which are not these tags
qualify as 'not in any special part of the document'?

>An alternative solution is to use META description tags and the patch I
>produced. No body text will appear in the output.

Unfortunately, we're trying to adjust searches on a large, extensive web
for which the installation of META tags is just not feasible. Thanks for
the idea, though.


