Re: [htdig] How to do incremental indexing, and how to estimate indexing time?

Subject: Re: [htdig] How to do incremental indexing, and how to estimate indexing time?
Date: Tue Apr 11 2000 - 17:16:12 PDT

The documentation for htdig on '-i' says:
Initial. Do not use any old databases. This is accomplished by first erasing the databases.

So, this would be used if you wanted to create an entirely new database. Useful if some pages have been deleted since htdig doesn't remove links that are no longer existing.

I think there's a couple more issues to think about for this one. A couple are,
-connection speed (if you are indexing other servers).
-type of webserver
-amount of memory

Just giving a couple of pointers, the pro's might have some insight here :)

>>> "Felciano, Ramon" <> 04/11/00 07:06PM >>>

Hello --

I've just downloaded and installed htdig on our intranet. I have two

1. Although it doesn't seem to be documented explicitly, I assume that
htdig's default mode of operation is incremental. That is, htdig will only
index new pages unless you specify -i on the command line. Can someone
confirm this for me? If this is the case, it seems like a worthwhile
addition to the documentation given that the mnemonic switch for
non-incremental mode is "-i" (!).

2. Is there any way to get a sense for how long htdig will run? For example,
assuming no pages need to be updated, how long would it take htdig to scan
through 10,000 pages? Obviously this is CPU dependent, etc. but I'm trying
to get an order-of-magnitude feel.




Ramon M. Felciano
INGENUITY Systems, Inc.

To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Tue Apr 11 2000 - 15:01:16 PDT