Re: [htdig] Indexing 30Gb of text


Subject: Re: [htdig] Indexing 30Gb of text
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Mon Feb 28 2000 - 19:10:21 PST


At 6:25 PM +0300 2/28/00, Andrey Novikov wrote:
>How can I smoothly index 30Gb of HTML text not disturbing
>existing index? Can I do it incremently in several steps?
>What hints can you share with me for that great job?

Well, if you mean you have an existing index that you want to keep
querying, you'll need to use -a at the least. :-)

As for hints, I'd say you'd want a lot of disk space, RAM, and swap.
You'll probably want to do it incrementally. You don't say much about
how your data is organized, but I'd try splitting it into several
pieces and then use htmerge to merge them together. Of course the
resulting database is going to be huge, so if you never need to
search the whole database together, I wouldn't even bother merging it
together.

Beyond that, you're in somewhat uncharted territory. I can think of
one or two people who have that much indexed, but they have their
databases split over many categories.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Feb 28 2000 - 21:32:49 PST