Re: [htdig] Indexing URLs

Subject: Re: [htdig] Indexing URLs
From: Vincent Queru (
Date: Wed Sep 27 2000 - 00:17:43 PDT

Gilles Detillieux wrote:

> According to Vincent Queru:
> > Some time ago, I read that someone wanted to index not only the HTML
> > source but also the URLs that the robot comes across when indexing a
> > site.
> >
> > I DO NOT want to index the URLs but unfortunately, they get indexed : is
> > there something I missed here ?
> htdig doesn't make a point of indexing the URLs itself, but if any pages
> it indexes contain URLs as the link description text in a hypertext link,
> then that links description text gets indexed. E.g.: in this link...
> <a href="">>
> the second occurrence of the URL will be treated as plain text, as
> well as a link description, and will be indexed. There's no easy,
> automatic way of avoiding this. Your best bet is to hunt down such
> files and change them. You could set description_factor to 0, and that
> will prevent the description from being indexed for the referenced page,
> but it will do this for all link descriptions, which may be overkill and
> undesired, plus htdig will still index the description as plain text for
> the page containing the reference, so you won't get rid of it entirely.

Ok, I put the description_factor to 0 and it works fine because the site I index
is very special (it consists in one page full of links that all point to the same
page, only the arguments change (it is a dynamic PHP-coded site)).

But I still have one more question : I had included a META NAME="robots"
VALUE=noindex" tag in the page containing the links but they still got indexed, is
that normal ?

Furthermore, it is not the link description that got indexed but the link itself
(ie. the URL contained in the A HREF tag).

To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.
List archives: <>
FAQ: <>

This archive was generated by hypermail 2b28 : Wed Sep 27 2000 - 00:21:35 PDT