Re: [htdig] Fwd: Re: [htdig] Problems with search results


Subject: Re: [htdig] Fwd: Re: [htdig] Problems with search results
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Wed Sep 27 2000 - 13:19:44 PDT


According to David Grubb:
> Just tried a few more things on this, but still having problems. I've set
> the description_factor to 0 as suggested, set search_algorithm as exact:1,
> all other indexing and search options have been left at the default
> vaulues. I've then rebuilt the index.
>
> Documents that do not contain the search word are still being returned
> with scores higher than documents with the search word. The wierd thing is
> these documents don't contain the search word at all (ie the word is not
> present in the HTML source) and shouldn't be included in the results.
>
> Any more suggestions?

Which version of htdig are your running? The symptoms sure seem to
point to link description text or a corrupt database, but if you set
description_factor to 0 and rebuilt the database, that would seem to
rule those out. Another thing you should look for is hyphenated words
like e-mail, which will be indexed as both email and mail in recent
versions of htdig. The same goes for any punctuation character in
valid_punctuation that's situated inside a word.

> >>> David Adams <D.J.Adams@soton.ac.uk> 09/25 8:34 pm >>>
> >
> > Hi all
> > > Having some trouble with the results of searches, and
> hoping someone can offer some advice.
> > > An example of the problem:
> searching for the word "email" returns a number of documents, one of
> those contains the word "mail" (not "email") and scores higher than a
> number of documents that contain the word "email"
> > > In the conf file,
> search_algorithm is set to exact:1 synonyms:0.5 endings:0.1
> > > Any ideas?
...
> Take a look at the <HEAD> section of that document. Are there <META>
> statements which contain "email" as a keyword, or in the description?
> If there are, then all is explained.
>
> Another possibility is that you have a number of links to the document
> where the text contains the word "email". You could try adding:
>
> description_factor: 0
>
> to your configuration file and re-making the index.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Wed Sep 27 2000 - 13:23:07 PDT