[htdig] suggestions for large multi-server indexing?

Subject: [htdig] suggestions for large multi-server indexing?
From: Albert Lunde (Albert-Lunde@nwu.edu)
Date: Fri Jun 09 2000 - 09:11:13 PDT

I'd like to hear your suggestions for doing large-scale multi-server
indexing with htdig.

In particular:

(1) What are the are the pros and cons of doing a single big index
(giving it starting URLs across all servers) vs. doing a number of
small indexes and merging them?

(2) What are issues likely to cause problems in scaling up?

(3) How large are some indexes that people have created sucessfully,
and what hardware/time does it take to do it?

The case I'm interested in is creating a campus-wide index of the
semi-official servers at our university.

No one knows exactly how much is out there to index, but rough
guesses suggest 200-300 servers, with something like 100,000 -
200,000 HTML pages.

(I've been following this list for a bit, but haven't been able to
get far with experients on my own due to difficulties building the
software on HP-UX 10.20 and lack of time.)

     Albert Lunde                      Albert-Lunde@nwu.edu

