Re: [htdig] Indexing very large number of URLs


Subject: Re: [htdig] Indexing very large number of URLs
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Thu Feb 03 2000 - 21:39:18 PST


On Thu, 3 Feb 2000, David Schwartz wrote:

> What I suppose I need to do is write a program to parse a list of URLs and
> split it into about twenty sub-lists. I then need to create a configuration

I would imagine you could do this very quickly with a shell script or Perl
script. Simply running "sort" and "wc -l" and you could do it by hand if
you only want twenty...

> 2) Is this the right way to do it? Is there an easier way? Is this a
> mistake?

You might want to take a look at the multidig shell scripts. They were
written to simplify this sort of thing a bit--it won't split your list for
you, but it will manage multiple databases and merging them together in
pretty close to the most efficient way.

See <http://www.htdig.org/files/contrib/scripts/>

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Thu Feb 03 2000 - 21:41:08 PST