Subject: Re: [htdig] Fwd: Re: [htdig] Problems with search results
From: Gilles Detillieux (firstname.lastname@example.org)
Date: Wed Sep 27 2000 - 13:19:44 PDT
According to David Grubb:
> Just tried a few more things on this, but still having problems. I've set
> the description_factor to 0 as suggested, set search_algorithm as exact:1,
> all other indexing and search options have been left at the default
> vaulues. I've then rebuilt the index.
> Documents that do not contain the search word are still being returned
> with scores higher than documents with the search word. The wierd thing is
> these documents don't contain the search word at all (ie the word is not
> present in the HTML source) and shouldn't be included in the results.
> Any more suggestions?
Which version of htdig are your running? The symptoms sure seem to
point to link description text or a corrupt database, but if you set
description_factor to 0 and rebuilt the database, that would seem to
rule those out. Another thing you should look for is hyphenated words
like e-mail, which will be indexed as both email and mail in recent
versions of htdig. The same goes for any punctuation character in
valid_punctuation that's situated inside a word.
> >>> David Adams <D.J.Adams@soton.ac.uk> 09/25 8:34 pm >>>
> > Hi all
> > > Having some trouble with the results of searches, and
> hoping someone can offer some advice.
> > > An example of the problem:
> searching for the word "email" returns a number of documents, one of
> those contains the word "mail" (not "email") and scores higher than a
> number of documents that contain the word "email"
> > > In the conf file,
> search_algorithm is set to exact:1 synonyms:0.5 endings:0.1
> > > Any ideas?
> Take a look at the <HEAD> section of that document. Are there <META>
> statements which contain "email" as a keyword, or in the description?
> If there are, then all is explained.
> Another possibility is that you have a number of links to the document
> where the text contains the word "email". You could try adding:
> description_factor: 0
> to your configuration file and re-making the index.
-- Gilles R. Detillieux E-mail: <email@example.com> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------ To unsubscribe from the htdig mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
This archive was generated by hypermail 2b28 : Wed Sep 27 2000 - 13:23:07 PDT