[htdig] Fwd: Re: [htdig] Problems with search results


Subject: [htdig] Fwd: Re: [htdig] Problems with search results
From: David Grubb (dgrubb@dlwc.nsw.gov.au)
Date: Tue Sep 26 2000 - 20:27:57 PDT


Sorry David, didn't realise when I hit reply that I didn;t include the mailing list address.

My question/problem/cry for help is below ;)

-------------------------------------------------------------------------------
David Grubb - Internet / Intranet Developer
dgrubb@dlwc.nsw.gov.au +61 2 9895-7913
Department of Land & Water Conservation
Sydney, Australia
-------------------------------------------------------------------------------

attached mail follows:


Hi all

Just tried a few more things on this, but still having problems. I've set the description_factor to 0 as suggested, set search_algorithm as exact:1, all other indexing and search options have been left at the default vaulues. I've then rebuilt the index.

Documents that do not contain the search word are still being returned with scores higher than documents with the search word. The wierd thing is these documents don't contain the search word at all (ie the word is not present in the HTML source) and shouldn't be included in the results.

Any more suggestions?

Cheers
Dave

-------------------------------------------------------------------------------
David Grubb - Internet / Intranet Developer
dgrubb@dlwc.nsw.gov.au +61 2 9895-7913
Department of Land & Water Conservation
Sydney, Australia
-------------------------------------------------------------------------------

>>> David Adams <D.J.Adams@soton.ac.uk> 09/25 8:34 pm >>>
>
> Hi all
> > Having some trouble with the results of searches, and
hoping someone can offer some advice.
> > An example of the problem:
searching for the word "email" returns a number of documents, one of
those contains the word "mail" (not "email") and scores higher than a
number of documents that contain the word "email"
> > In the conf file,
search_algorithm is set to exact:1 synonyms:0.5 endings:0.1
> > Any ideas?
> > Thanks in advance
>
> -------------------------------------------------------------------------------
> David Grubb - Internet / Intranet Developer
> dgrubb@dlwc.nsw.gov.au +61 2 9895-7913
> Department of Land & Water Conservation
> Sydney, Australia
> -------------------------------------------------------------------------------

Take a look at the <HEAD> section of that document. Are there <META>
statements which contain "email" as a keyword, or in the description?
If there are, then all is explained.

Another possibility is that you have a number of links to the document
where the text contains the word "email". You could try adding:

description_factor: 0

to your configuration file and re-making the index.

-- 
 
David J Adams
<D.J.Adams@soton.ac.uk>
Computing Services
University of Southampton

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Tue Sep 26 2000 - 19:32:33 PDT