Re: [htdig] Limit amount of found pages?


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Mon, 5 Apr 1999 11:37:12 -0500 (CDT)


According to Jaap de Heer:
> I used htdig to index quite a large site as a test
> (www.planetquake.com). The database takes up about 280 MB.
> Now when I perform a search, the more matches htsearch
> finds, the longer the search takes.
> Sounds logical, but when you search for a word that happens
> to be found on 7928 pages, the search takes about 15 seconds
> and severely abuses the webserver.
> Is there a nice way to limit this, so that you simply get no
> more than, say, 1000 matches?
> I guess another way to solve this 'problem' would be for
> htdig to use a cache... could anyone tell me if this is
> possible, and if not, if it's planned?

This should maybe go into the FAQ...

When you run htsearch with no customization, on a large database, and
it gets a lot of hits, it tends to take a long time to process those hits.
Some users with large databases have reported much higher performance, for
searches that yield lots of hits, by setting the backlink_factor attribute
in htdig.conf to 0, and sorting by score. The scores calculated this
way aren't quite as good, but htsearch can process hits much faster when
it doesn't need to look up the docdb record for each hit, just to get the
backlink count, date or title, either for scoring or for sorting.

This affects 3.1.0b3 and up, but should be addressed better in 3.2, when
it comes out in a few months.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Mon Apr 05 1999 - 11:09:55 PDT