Re: [htdig] less files

Frank Guangxin Liu (
Sun, 18 Jul 1999 21:17:45 -0500 (EST)

On Sun, 18 Jul 1999, Geoff Hutchison wrote:

> > since none of those existing files got changed (modified-since),
> > they won't be processed and thus those missing files
> > can't be seen by htdig.
> This is partly correct. If you have set remove_bad_urls, this is correct.
> >From the documentation (
> If TRUE, htmerge will remove any URLs which were marked as unreachable by
> htdig from the database. If FALSE, it will not do this. When htdig is run
> in initial mode, documents which were referred to but could not be
> accessed should probably be removed, and hence this option should then be
> set to TRUE, however, if htdig is run to update the database, this may
> cause documents on a server which is temporarily unavailable to be
> removed. This is probably NOT what was intended, so hence this option
> should be set to FALSE in that case.
> > should, instead of skipping this file (won't process
> > it at all), still parse the file for links. Of course,
> In general, the slowest part of the indexing is retrieving the document.
> So the update dig saves a *lot* of time by just sending out
> If-Modified-Since headers. So if an update dig "reparsed looking for
> URLs," it really wouldn't be any faster than the initial dig. In that
> case, why bother doing an update dig?
It's my fault. I thought the contents of the files are already
saved in the db. "reparse looking for URLs" shouldn't require
a re-retrieval the of the file since it is not modified since.

> -Geoff Hutchison
> Williams Students Online

To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Sun Jul 18 1999 - 18:35:37 PDT