Subject: Re: [htdig] index always scores 100
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Thu Sep 07 2000 - 10:58:51 PDT
According to Geoff Hutchison:
> At 9:15 AM -0500 9/5/00, Ted Stresen-Reuter wrote:
> >If you want, go to http://www.chicagophilanthropy.com/search/ and enter the
> >word "kraft" as the search term and you'll see what I mean. I've tried
> >deleting the databases and indexing again, but I still got the same
> >results....
>
> So here's the answer. I poured through your verbose output and found
> a few links like this:
>
> href: http://www.chicagophilanthropy.com/ (Published: March 1998 Kraft
> Foods, Inc. names Amina Dickerson ...)
>
> So this is where it's getting "Kraft"--from the link text. You can
> turn this off using description_factor since it doesn't seem to be
> working very well in your case. Usually the text of links is fairly
> accurate as a description of the page (or it's so general that it's
> not likely to show up in searches like "click here.")
>
> In any case, the combination of this and possibly backlink_factor are
> probably the reason you're getting these "phantom" matches.
It's strange that I didn't find any documents containing links like
the one above when I searched for "kraft" on his web site. Do these
documents contain any <meta name="robots" content="noindex,follow">
tags, or does his search form use a hidden "restrict" or "exclude" field
that I didn't notice? My understanding is that link description text
is supposed to appear in the index for both the hyperlinked document,
using description_factor, and the document containing the link, using
text_factor.
-- Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
This archive was generated by hypermail 2b28 : Thu Sep 07 2000 - 11:00:48 PDT