Re: [htdig] Update run adds missing servers ?


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Wed, 14 Jul 1999 09:48:42 -0500 (CDT)


According to Budd, S.:
> We have a site with about 170 servers. Whenever I run a initial update
> there are always about 10 servers not running. Would running an
> update run say a day later pick up the missing servers which have
> come up in the meantime, and leave the pages in the index of the servers
> which
> have gone down in the meantime?
>
> None of the pages which referenced the missing servers will
> necessarily have been updated so, does the update run look for
> the " no server running" situations?

Yes, but there are a few things you need to know about this. First of
all, you ought to read

        http://www.htdig.org/attrs.html#remove_bad_urls

because you will have to make sure this option is off for htmerge to
allow missing documents to remain in the database. Secondly, you should
be aware of the fact that even on an update run, htdig will not just look
for the documents it missed the first time. It will check every document
already indexed to see if it was updated. This may take a while, but
is sped up considerably if the servers understand the if-modified-since
header. If any new, or previously missing documents are found, they
will be parsed, and any new links will be followed as well. However,
with the remove_bad_urls turned off, I don't think documents will ever
get removed from the database, even if they have actually been removed
from the server on which they're stored. Perhaps someone can shed some
more light on this?

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Wed Jul 14 1999 - 07:12:20 PDT