[htdig] Indexing very large number of URLs


Subject: [htdig] Indexing very large number of URLs
From: David Schwartz (davids@webmaster.com)
Date: Thu Feb 03 2000 - 21:27:49 PST


        I have a very large list (just over a million) or URLs that I want to index
(just these URLs, not any links from them). If I just hand them to htdig, it
runs very slowly -- DNS and dead/slow servers slow it down to almost
nothing.

        What I suppose I need to do is write a program to parse a list of URLs and
split it into about twenty sub-lists. I then need to create a configuration
file for each sub-list and launch a copy of htdig. When all twenty digs
finish, I need to keep running htmerge to combine the databases. (I have the
hardware to pull this off, I think.)

        I just want to check two things:

        1) Has anybody already done this? No reason for me to reinvent the wheel.

        2) Is this the right way to do it? Is there an easier way? Is this a
mistake?

        DS

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Thu Feb 03 2000 - 21:29:42 PST