Re: [htdig] Going for the big dig


Subject: Re: [htdig] Going for the big dig
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Tue Dec 19 2000 - 08:39:33 PST


According to Terry Collins:
> Geoff Hutchison wrote:
> >
> > At 10:14 AM +1100 12/19/00, Terry Collins wrote:
> > >And make sure you don't ignore robots.txt
> >
> > Yes, though someone would need to alter the code to do this.
>
> If you are doing an external site, it shouldn't be to much effort to
> just read this and set the excludes.
>
> Courtesy thing.

I think you misunderstood. htdig already does read the robots.txt file
and skips all disallowed documents. You don't need to do this manually.
Geoff was saying you'd need to alter the code in order to ignore robots.txt,
which definitely would be a bad thing if you then use the hacked htdig to
index sites that are not your own.

Actually, on my site I don't bother with exclude_urls at all, and use the
robots.txt file instead. This way, anything that I don't want indexed by
htdig won't be indexed by any other search engine either.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Tue Dec 19 2000 - 08:50:11 PST