Re: [htdig] Indexing URLs


Subject: Re: [htdig] Indexing URLs
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Tue Sep 26 2000 - 11:25:40 PDT


According to Vincent Queru:
> Some time ago, I read that someone wanted to index not only the HTML
> source but also the URLs that the robot comes across when indexing a
> site.
>
> I DO NOT want to index the URLs but unfortunately, they get indexed : is
> there something I missed here ?

htdig doesn't make a point of indexing the URLs itself, but if any pages
it indexes contain URLs as the link description text in a hypertext link,
then that links description text gets indexed. E.g.: in this link...

  <a href="http://www.htdig.org/files/">http://www.htdig.org/files/>

the second occurrence of the URL will be treated as plain text, as
well as a link description, and will be indexed. There's no easy,
automatic way of avoiding this. Your best bet is to hunt down such
files and change them. You could set description_factor to 0, and that
will prevent the description from being indexed for the referenced page,
but it will do this for all link descriptions, which may be overkill and
undesired, plus htdig will still index the description as plain text for
the page containing the reference, so you won't get rid of it entirely.

Another common problem is when you point htdig to a directory with no
index.html file, and your server automatically generates a directory
listing for the client (htdig), this listing will contain links to all
the files in the directory with the file name as description text, so
the file names get indexed in this case. Torsten suggested a way of
disabling this in an earlier message:

  http://www.htdig.org/mail/2000/09/0118.html

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Tue Sep 26 2000 - 11:32:34 PDT