Re: [htdig] Going for the big dig


Subject: Re: [htdig] Going for the big dig
From: Terry Collins (terryc@woa.com.au)
Date: Tue Dec 19 2000 - 12:20:25 PST


Gilles Detillieux wrote:

...snip...

> I think you misunderstood. htdig already does read the robots.txt file
> and skips all disallowed documents.

Woops, my apologies for that gaff, my brain has started the holiday
season without me {:-).
Actually, I given up remembering how you do/I did anything under linux -
with versions every three months, it is all different everytime I look
at something.

You are correct about that as I now remember having to look at this in
detail as my robots.txt excludes all the lists I archive on site from
indexing bots and htdig very obediently acted on this. I wanted htdig to
actually index the contents of these lists, but exclude everything else,
which it now does quite nicely.

> Actually, on my site I don't bother with exclude_urls at all, and use the
> robots.txt file instead. This way, anything that I don't want indexed by
> htdig won't be indexed by any other search engine either.

I wish all search engines did obey robots text.

Thanks for the development effort with htdig. Very useful app.

--
   Terry Collins {:-)}}} Ph(02) 4627 2186 Fax(02) 4628 7861  
   email: terryc@woa.com.au  www: http://www.woa.com.au  
   WOA Computer Services <lan/wan, linux/unix, novell>

"People without trees are like fish without clean water"

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Tue Dec 19 2000 - 13:31:43 PST