Subject: Re: [htdig] Indexing very large number of URLs
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Thu Feb 03 2000 - 21:39:18 PST
On Thu, 3 Feb 2000, David Schwartz wrote:
> What I suppose I need to do is write a program to parse a list of URLs and
> split it into about twenty sub-lists. I then need to create a configuration
I would imagine you could do this very quickly with a shell script or Perl
script. Simply running "sort" and "wc -l" and you could do it by hand if
you only want twenty...
> 2) Is this the right way to do it? Is there an easier way? Is this a
> mistake?
You might want to take a look at the multidig shell scripts. They were
written to simplify this sort of thing a bit--it won't split your list for
you, but it will manage multiple databases and merging them together in
pretty close to the most efficient way.
See <http://www.htdig.org/files/contrib/scripts/>
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Thu Feb 03 2000 - 21:41:08 PST