Re: [htdig] Multiple merging


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Wed, 3 Nov 1999 08:55:55 -0600 (CST)


According to Geoff Hutchison:
> At 11:23 AM +0100 11/2/99, Andrea Carpani wrote:
> >I have dug separately 1200 sites and this operation took a few hours. The
> >problem is that the merging of the whole data is taking more than 10 days.
> >Is this normal? How long would it take to build a single database
> >from a single
> >dig (feeding ht://dig with the whole list of sites at once)?
>
> It's hard to know what's "normal" or which option would be faster.
> Remember we're all digging very different servers, pages, etc. For
> example, you don't mention how many URLs you have or the size of your
> database.
>
> I'm guessing the merging is taking a while because either (or both):
> a) 1200 sites => many, many URLs => large databases
> b) the machine you're using doesn't have much RAM and is swapping to merge
>
> These are obviously intertwined. The amount of RAM you need is
> related to the size of your databases...

I'm wondering how Andrea is merging these 1200 separate databases.
I don't know, but I'd guess that merging them hierarchically would be
faster than merging them linearly. E.g., for 8 databases (1-8), you
could merge 2-8 in turn into database 1, but it seems it would be more
efficient to merge 2 into 1, 4 into 3, 6 into 5, 8 into 7, 3 into 1,
7 into 5, and finally 5 into 1. I'm guessing though. I don't know that
anyone ever benchmarked it.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word unsubscribe in the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Wed Nov 03 1999 - 07:05:53 PST