Re: [htdig] irrelevant pages in search


Subject: Re: [htdig] irrelevant pages in search
From: David Mifsud (compu@csc.um.edu.mt)
Date: Sun Nov 28 1999 - 04:12:16 PST


The database is a merge of about 5 DBs, and contains around 20K
documents. The only relation about the documents is that they
are hosted in the same region.

I have rebuilt the database, (merged another DB), and checked
the merging log, but did not find any errors.

Now the search for "buskett" does only include a very small
ammount of irrelevant pages. But when I searched for "david
mifsud", I got 7 out of 10 irrelevant pages(1-4, 6-8)

The search algorithm I'm using es exact:1

BTW, by irrelevant I mean, loading the page, doing a search
for both the words david and mifsud, and not finding any of
the words in the source!

http://alpha.CompuCreations.com/search/

regards,

dave

* From ghutchis@wso.williams.edu Sat Nov 27 21:58:28 1999
* To: Dave <compu@csc.um.edu.mt>
* Subject: Re: [htdig] irrelevant pages in search
* Cc: htdig@htdig.org
*
* At 10:54 AM +0100 11/18/99, Dave wrote:
* >Try it out at:
* > http://alpha.CompuCreations.com/search/
* >
* >Words I have tried include "buskett" (results 2/3/6/10 are
* >irrelevant, i.e. 40% from the 1st page!)
*
* I tried it out when you first sent the message and again now--I see
* that a few of the results are irrelevant, but I'm not so sure all of
* those you mention are irrelevant. At the least, I can see why they're
* being flagged.
*
* You don't mention how many pages you have in your database or how
* closely related they are. Offhand, I think some of your "irrelevant"
* pages are scoring highly because they have a high backlink weight.
* You might try lowering the backlink_factor
* <http://www.htdig.org/attrs.html#backlink_factor>
*
* This factor weights "importance" of pages, essentially as a ratio
* between the number of links pointing to a page divided by the number
* of links on the page. (The ratio helps to remove "link farms" which
* often have many links to them.)
*
* Hope that helps,
*
* -Geoff Hutchison
* Williams Students Online
* http://wso.williams.edu/
*

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b25 : Sun Nov 28 1999 - 04:24:24 PST