Re: [htdig] can I filter a page from the search result


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Thu, 20 May 1999 09:27:07 -0500 (CDT)


According to George Toye:
> I have a top level HTML page which contains a list links to lower level
> websites. HTDIG seems to index the URL string itself. Even if there is no
> visible text on this top page, it shows up in the search result page. My
> problem is that this page gets top ranking whenever text in the "<a
> href=...>" matches my search value. I'd rather this page not show up in
> the search result at all.
>
> Is there a way to force htdig to either not index the <a href> URL string
> itself or is there a way to forcibly demote this page in the ranking?

That's odd. There's been talk of adding an option to index words in
the URL strings, but that's not in the code yet. Right now, htdig does
index the text between <a href=URL> and </a> tags, but not the URL itself.

In any case, you can add a <meta name=robots content=noindex> tag at
the start of your document, and htdig will follow links in it, but not
index any of its text.

Another couple things you'll want to look out for: 1) if your top level
HTML page has a <title>...</title> tag, make sure it occurs after the
meta tag above (i.e. put the meta tag as the first thing in the <head>
section); and 2) if any document that gets indexed contains a link
back to your top level HTML page, the description text (between the <a>
and </a> tags) of that link will be used as index entries pointing to
that page.

I think the second point is moot if the page has any sort of noindex
tag, in which case it gets taken right out of the index. However,
even if a document contains no text at all, it still can get indexed
if there is description text in the links pointing to it, unless it
has some sort of noindex tag. I say some sort of noindex tag because
there are many different tags that can be used to turn off indexing,
but all of these, other than the meta tag I've shown above, also turn
off following of links.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu May 20 1999 - 06:39:14 PDT