Re: [htdig] Indexing large amount of non-related files


Subject: Re: [htdig] Indexing large amount of non-related files
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Tue May 23 2000 - 15:09:28 PDT


At 6:51 PM +0200 5/23/00, Marcel Hicking wrote:
>I have at about 200,000 plain text files
>spread over a few 100, maybe 1000, directories.
>File size is between a few bytes and, sometimes,
>above 1mb. All in all this ends up in 1.2gb
>of data, growing daily. The files do not
>contain HTML code and I need them to be
>indexed at least daily (that is, nightly ;-)
>Most of the files are static, only few of them
>change, say, 100-200 a day.

Well I don't think you'll have much problem indexing them with
ht://Dig. As to performance, it depends a lot on your machine and the
data itself. It sounds like you might get some use out of local_urls,
though if they don't have extensions, you might see it hit the HTTP
server a lot as it tries to figure out the MIME type.

Also remember that ht://Dig currently doesn't have any sort of "index
this directory" feature.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue May 23 2000 - 13:02:42 PDT