Re: [htdig] Indexing URLs


Subject: Re: [htdig] Indexing URLs
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Wed Sep 27 2000 - 13:48:59 PDT


According to Vincent Queru:
> But I still have one more question : I had included a META NAME="robots"
> VALUE=noindex" tag in the page containing the links but they still got indexed, is
> that normal ?
>
> Furthermore, it is not the link description that got indexed but the link itself
> (ie. the URL contained in the A HREF tag).

Yes, that is normal behaviour. You'd need content="none" or
content="noindex, nofollow" to disable both text indexing and link
following.

See http://www.htdig.org/FAQ.html#q4.15 and follow the links for
additional documentation.

... and in a followup message...
> Anyway using the description_factor=0 DID solve the problem, so I won't
> bother you any more with my questions unless you think there might be
> a problem somewhere else :
>
> The URLs encountered all contain numbers and as I turned the number
> indexing on, this is probably the reason why they got indexed in
> first place although they do not appear as such between the <A HREF>
> and </A> tag.

I don't know if it's a problem as such, but I don't think you were
thorough enough in looking for URLs used as link descriptions. The
description_factor attribute ONLY affects text between the <A HREF>
and </A> tags, so if setting it to 0 made the offending URLs disappear
from the index, they must be between anchor tags somewhere in your
files.

As for the effect of the allow_numbers attribute, it affects "words"
containing digits but not letters. Any word containing both letters
and digits will be indexed even if allow_numbers is false. In any
case, setting allow_numbers to true will not cause htdig to looks for
words in contexts it wouldn't otherwise look. It doesn't parse href
parameter values as text regardless of which attribute settings you
have, unless you've modified htdig/HTML.cc to do so.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Wed Sep 27 2000 - 13:56:10 PDT