[htdig] update digging


Frank Guangxin Liu (frank@ctcqnx4.ctc.cummins.com)
Fri, 26 Feb 1999 20:58:12 -0500 (EST)


Is it true that if I run htdig to update my db (without -i option),
htdig will ignore "limit_urls_to" and try all the urls in the db?

Here is what I did and what I found:
1) Run htdig with -i to create the initial db for mycompany.com Intranet.
   start_url: http://www3.mydept.mycompany.com
   limit_urls_to: .mycompany.com
2) A week later, I run htdig to update my db (without -i option) for
   mydept.mycompany.com sub-domain only.
   start_url: http://www3.mydept.mycompany.com
   limit_urls_to: .mydept.mycompany.com

For some reasons, this update run ignored "limit_urls_to" and
went through all servers in .mycompany.com !! And it also tries to get
all documents on all servers instead of only getting the new
documents. (I checked the www log file on a small server with
only several html files and found GET for all files although
there has been no change between the initial htdig and the update
htdig for this small www server).

Another strange thing is that although I deleted some html files
on the server http://www3.mydept.mycompany.com BEFORE the update
run of htdig, those deleted url still left in the db. A subsequent
search can still find those matches. I don't set the "remove_bad_urls"
in my htdig.conf file which means it should be the default true.

Thanks for any hints!

Frank

   
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Mar 04 1999 - 09:09:08 PST