RE: [htdig] suggestions for large multi-server indexing?


Subject: RE: [htdig] suggestions for large multi-server indexing?
From: Budd, S (s.budd@ic.ac.uk)
Date: Fri Jun 09 2000 - 09:23:29 PDT


We index using 3.1.3

htmerge: Total documents: 177428
htmerge: Total doc db size (in K): 1676650
from 181 servers. It takes about 28 hours
with one initial dig and another update dig
two hours later.

-----Original Message-----
From: Albert Lunde [mailto:Albert-Lunde@nwu.edu]
Sent: Friday, June 09, 2000 5:11 PM
To: htdig@htdig.org
Subject: [htdig] suggestions for large multi-server indexing?

I'd like to hear your suggestions for doing large-scale multi-server
indexing with htdig.

In particular:

(1) What are the are the pros and cons of doing a single big index
(giving it starting URLs across all servers) vs. doing a number of
small indexes and merging them?

(2) What are issues likely to cause problems in scaling up?

(3) How large are some indexes that people have created sucessfully,
and what hardware/time does it take to do it?

The case I'm interested in is creating a campus-wide index of the
semi-official servers at our university.

No one knows exactly how much is out there to index, but rough
guesses suggest 200-300 servers, with something like 100,000 -
200,000 HTML pages.

(I've been following this list for a bit, but haven't been able to
get far with experients on my own due to difficulties building the
software on HP-UX 10.20 and lack of time.)

---
     Albert Lunde                      Albert-Lunde@nwu.edu

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Fri Jun 09 2000 - 07:13:40 PDT