[htdig] htdig performance on large databases

Daniel Marek Gradzikiewicz (gdaniel@softhome.net)
Sat, 15 May 1999 16:12:49 -0400

I am running htdig on about 120 000 documents and about 4200 servers.
It took me 2 weeks to get the results due to the limited bandwith.
Everything is nice and fine but if I run a popular word it takes up to 5
minutes to get the results from the search. Currently the database is
about 1.6 Gigs. Any less 'popular' comes up relatively faster. How do
I improve the perfomance ?

I have upgraded the server to a PII with 64 MB but I am still runing it
on a 6.4 IDE Harddrive... would memory help ? How much should I buy ?
I am using Linux 2.0.35 SuSE and htdig 3.1.2.

I was thinking of a way of NOT sorting the results. That should make it
faster. I mean, leave out the RELEVANCE factor and sorting out
compleatly and spit out the results of the search. I think Inktomi does
that - they don't sort for relevance or date or anything at all.

any ideas or previous experience ?

I don't mind digging for 2 weeks, but at least after that I want to be
able to retrieve the stuff within seconds ;-)

Has anybody tried to put the database into a HUGE RAM disk (1 or 2 GB of
RAM) to increase the spead and performance ? Should I switch to SCSI ?

Is there a way of caching the results so that next time the search does
not take so long ?

Otherwise the package is great ! Long live open source !

Daniel Gradzikiewicz

SubNet Group Ltd.

To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Sat May 15 1999 - 13:21:42 PDT