Re: [htdig] One solution for slow dig on Linux.


Subject: Re: [htdig] One solution for slow dig on Linux.
From: Torsten Neuer (tneuer@inwise.de)
Date: Tue Dec 21 1999 - 00:09:54 PST


> Sean Pecor wrote:
>
> Hello all,
>
[...]
>
> My second problem was related to the first. Since my cgi engine needed
> its own query string to work its magic, and indirectly controlled
> htsearch, I had to modify the htsearch source so that it didn't detect
> the presence of the query string (I renamed REQUEST_METHOD in
> htlib/cgi.cc). In this manner, I could then pass the goods directly as
> arguments to an external call to htsearch during the execution of my
> cgi (i.e. htsearch -c /my/custom.conf
> "page=2&words=woah&cmd=command&searchtype=mysearch"). I then had to
> modify the portion of Display.cc that built the hrefs to the next,
> previous and page number links so that my own special query string
> name/value pairs were piggy-backed. Whew, that was pretty easy too.
> After whipping up my own set of html templates for htsearch (simple
> ones really, since the interface framework was actually being provided
> by my group collaboration engine) I was ready to start the real fun
> stuff.

As of 3.1.2 there was already a patch solution for this which has been
incorporated into 3.1.4 and which is much cleaner than just renaming
REQUEST_METHOD. In other words, you applied a patch for something the
search engine is already able to do ;-)

>
[...]
>

Regarding the time-out settings, I think that this heavily depends upon
the production system used. If you have good routes to every site, you
will probably be fine with it. If not, it might cause some trouble.

I'd rather work around that by trapping the site response codes prior to
indexing the sites using a tool like DLC (dead-link-check). If you have
plenty of disk space, I'd even have a single small database for every
site being indexed (and have them merged after the index run), in which
case you can run multiple instances of the indexer concurrently (you can
then have a merger process waiting for new input to be merged into the
new search database). That should further increase the speed of the in-
dexer process.

cheers,
  Torsten

-- 
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstraße 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: info@inwise.de            Internet: http://www.inwise.de

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Dec 21 1999 - 00:24:23 PST