Re: [htdig] Fwd: Re: [htdig] Problems with search results

Subject: Re: [htdig] Fwd: Re: [htdig] Problems with search results
From: David Grubb (
Date: Wed Sep 27 2000 - 18:01:55 PDT

Hyphenated words it is! *slap on the forehead* - thanks for the help (sometimes it's so obvious, it's painful) ;)


David Grubb - Internet / Intranet Developer +61 2 9895-7913
Department of Land & Water Conservation
Sydney, Australia

>>> Gilles Detillieux <> 09/28 6:19 am >>>
According to David Grubb:
> Just tried a few more things on this, but still having problems. I've set
> the description_factor to 0 as suggested, set search_algorithm as exact:1,
> all other indexing and search options have been left at the default
> vaulues. I've then rebuilt the index.
> Documents that do not contain the search word are still being returned
> with scores higher than documents with the search word. The wierd thing is
> these documents don't contain the search word at all (ie the word is not
> present in the HTML source) and shouldn't be included in the results.
> Any more suggestions?

Which version of htdig are your running? The symptoms sure seem to
point to link description text or a corrupt database, but if you set
description_factor to 0 and rebuilt the database, that would seem to
rule those out. Another thing you should look for is hyphenated words
like e-mail, which will be indexed as both email and mail in recent
versions of htdig. The same goes for any punctuation character in
valid_punctuation that's situated inside a word.

> >>> David Adams <> 09/25 8:34 pm >>>
> >
> > Hi all
> > > Having some trouble with the results of searches, and
> hoping someone can offer some advice.
> > > An example of the problem:
> searching for the word "email" returns a number of documents, one of
> those contains the word "mail" (not "email") and scores higher than a
> number of documents that contain the word "email"
> > > In the conf file,
> search_algorithm is set to exact:1 synonyms:0.5 endings:0.1
> > > Any ideas?
> Take a look at the <HEAD> section of that document. Are there <META>
> statements which contain "email" as a keyword, or in the description?
> If there are, then all is explained.
> Another possibility is that you have a number of links to the document
> where the text contains the word "email". You could try adding:
> description_factor: 0
> to your configuration file and re-making the index.

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW: 
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this. List archives: <> FAQ: <>

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this. List archives: <> FAQ: <>

This archive was generated by hypermail 2b28 : Wed Sep 27 2000 - 17:06:51 PDT