[htdig] update digging


Frank Guangxin Liu (frank@ctcqnx4.ctc.cummins.com)
Tue, 2 Mar 1999 09:10:11 -0500 (EST)


>
> Is it true that if I run htdig to update my db (without -i option),
> htdig will ignore "limit_urls_to" and try all the urls in the db?
>
> Here is what I did and what I found:
> 1) Run htdig with -i to create the initial db for mycompany.com Intranet.
> start_url: http://www3.mydept.mycompany.com
> limit_urls_to: .mycompany.com
> 2) A week later, I run htdig to update my db (without -i option) for
> mydept.mycompany.com sub-domain only.
> start_url: http://www3.mydept.mycompany.com
> limit_urls_to: .mydept.mycompany.com
>
> For some reasons, this update run ignored "limit_urls_to" and
> went through all servers in .mycompany.com !! And it also tries to get
> all documents on all servers instead of only getting the new
> documents. (I checked the www log file on a small server with
> only several html files and found GET for all files although
> there has been no change between the initial htdig and the update
> htdig for this small www server).
>
> Another strange thing is that although I deleted some html files
> on the server http://www3.mydept.mycompany.com BEFORE the update
> run of htdig, those deleted url still left in the db. A subsequent
> search can still find those matches. I don't set the "remove_bad_urls"
> in my htdig.conf file which means it should be the default true.
>
> Thanks for any hints!
>
> Frank
>
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Mar 04 1999 - 09:09:18 PST