Re: [htdig] scalability of htdig

Geoff Hutchison (
Mon, 15 Mar 1999 23:31:00 -0500

At 10:55 PM -0500 3/15/99, Philip Jenkins wrote:
>I noticed that some sites that use ht://dig have
>over 5,000 items indexed. I was wondering if you could tell me how well
>scales to larger sites.

Actually, judging from e-mail on the list and sites I know, you would rank
in the "smaller" category for ht://Dig. My site handles around 75,000
documents quite nicely. For me, generating a new db takes about 3-4 hrs
(this is improving with some new code), and updates take around 1/2 hr.

>as I am needing to do. Does ht://dig handle both Indexes like Yahoo and
>normal searching?

By itself, ht://Dig does not produce a Yahoo-style index. However, there is
quite a bit of software around that does this sort of thing.

>How well does it crawl sites to index them, does it crash on large

I think I speak for the whole developer community when I say it should not
crash. I have never heard of it crashing simply because it was "too large"
unless you ran out of RAM or disk space. Crashing because of strange HTML
files does happen occaisionally.

>One last question, I wanted to add a link to
>have people submit there own sites, if I do this does ht://dig
>automatically index them?

You could easily set up a form. You can have ht://Dig index all URLs given
in a file. So you'd just need a CGI to add the link to the file.

