Re: [htdig] less files


Jim Cole (greyleaf@yggdrasill.net)
Sun, 18 Jul 1999 18:18:08 -0600


Hi.

I am still sort of new to htdig, so I am sure someone will correct me if
I am way off base ;) When a document is missed for whatever reason, no
information regarding that document is saved in the index. So, when you
do an update, htdig doesn't know anything about the missing document and
therefore does not index it. If in addition, a parent of that document
is reported as not having been updated since the last time it was
indexed, htdig does not bother following any of its links. Which of
course is the whole point of supporting an update. But the side effect
is that a missing document will not be found until some document that
links to it is indexed.

I am not sure there is an "easy" way around this problem short of doing
a full dig. Depending on the specifics of what was missed, you might be
able to get away with running htdig against a subset of the documents
and then merging the results of that dig with main database. Then future
updates should catch all of the newly added documents. Of course every
time you miss a document, this problem is going to creep back.

Jim

Frank Guangxin Liu wrote:

> I've been seeing this strange behaviour for a long time:
> if, for whatever reason (I've seen this several times for
> several servers, maybe because the server is too busy to
> responde...), an initial htdig fails to grab all files
> from a server (I know this because the statistics output
> from "htdig" shows far less files than actual), further
> update htdig will never catch up those missing files.
> The only solution in this case is to do an initial htdig
> again, in which case, the statistics output from htdig
> may give the actual numbers of files from that server.
>
> My question is
> why update htdig can't catch up those missing files?
> Is there another solution other than re-do an initial htdig?
>
> Thanks!
> Frank
>
> ------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> htdig@htdig.org containing the single word "unsubscribe" in
> the SUBJECT of the message.

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Sun Jul 18 1999 - 16:28:56 PDT