Re: [htdig] How to do incremental indexing, and how to estimate indexing time


Subject: Re: [htdig] How to do incremental indexing, and how to estimate indexing time
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Tue Apr 11 2000 - 20:25:45 PDT


At 5:06 PM -0700 4/11/00, Felciano, Ramon wrote:
>1. Although it doesn't seem to be documented explicitly, I assume that
>htdig's default mode of operation is incremental. That is, htdig will only

Yes. Actually the terminology is "initial" and "update," so the
switch isn't really mislabeled. Of course you may want to include a
-a switch if you're updating...

>2. Is there any way to get a sense for how long htdig will run? For example,
>assuming no pages need to be updated, how long would it take htdig to scan
>through 10,000 pages? Obviously this is CPU dependent, etc. but I'm trying
>to get an order-of-magnitude feel.

Just scan through them? If it doesn't actually download anything more
or have to re-parse things, it's going to be pretty fast. The
installation on htdig.org currently has about 15,000 URLs and has to
index new mail messages mostly. The update usually takes a few
minutes. (It also has local_urls set, which helps a fair amount.)

In general, updates will be *much* faster than the initial dig.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Apr 11 2000 - 18:11:16 PDT