Re: [htdig] less files

Frank Guangxin Liu (
Sun, 18 Jul 1999 18:35:46 -0500 (EST)

After sending out my mail, I think about this problem again.
Here is my guess of the cause of the problem:
those missing files are ONLY linked to from those
existing files in the htdigdb. During the update htdig,
since none of those existing files got changed (modified-since),
they won't be processed and thus those missing files
can't be seen by htdig.

One solution would be:
during updatedig, if a file hasn't been modified, htdig
should, instead of skipping this file (won't process
it at all), still parse the file for links. Of course,
htdig doesn't need to extract key words from this file
since this is an old file and already in the db.


---------- old message ----------
Date: Sun, 18 Jul 1999 17:12:57 -0500 (EST)
From: Frank Guangxin Liu <>
Subject: less files

I've been seeing this strange behaviour for a long time:
if, for whatever reason (I've seen this several times for
several servers, maybe because the server is too busy to
responde...), an initial htdig fails to grab all files
from a server (I know this because the statistics output
from "htdig" shows far less files than actual), further
update htdig will never catch up those missing files.
The only solution in this case is to do an initial htdig
again, in which case, the statistics output from htdig
may give the actual numbers of files from that server.

My question is
why update htdig can't catch up those missing files?
Is there another solution other than re-do an initial htdig?


