Re: [htdig] irrelevant pages in search


Subject: Re: [htdig] irrelevant pages in search
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Sat Nov 27 1999 - 12:57:30 PST


At 10:54 AM +0100 11/18/99, Dave wrote:
>Try it out at:
> htp://alpha.CompuCreations.com/search/
>
>Words I have tried include "buskett" (results 2/3/6/10 are
>irrelevant, i.e. 40% from the 1st page!)

I tried it out when you first sent the message and again now--I see
that a few of the results are irrelevant, but I'm not so sure all of
those you mention are irrelevant. At the least, I can see why they're
being flagged.

You don't mention how many pages you have in your database or how
closely related they are. Offhand, I think some of your "irrelevant"
pages are scoring highly because they have a high backlink weight.
You might try lowering the backlink_factor
<http://www.htdig.org/attrs.html#backlink_factor>

This factor weights "importance" of pages, essentially as a ratio
between the number of links pointing to a page divided by the number
of links on the page. (The ratio helps to remove "link farms" which
often have many links to them.)

Hope that helps,

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b25 : Sat Nov 27 1999 - 13:10:06 PST