[htdig] Larger db files with 3.1.2; pdf; importance of backlink_factor:0

Albert Desimone jr (bdesimon@arches.uga.edu)
Fri, 14 May 1999 08:44:02 -0400 (EDT)

Hi ...

   This is more of a "sharing my experience" thing than anything else.
   A few observations while I sing the praises of ht://Dig.

   ht://Dig is bad to the bone, make no mistake about it. I give it
   high praise with one of my favorite Georgia (USA) expressions: Finer
   than frog hair split nine ways. (BTW: Frogs have *very* fine hair.)

   For several months now, a search box for websites has been available
   right on the UGA homepage (www.uga.edu) -- powered by ht:/Dig.

   But surely I digress.

   Anyway, I am working with 3.1.2, upgrading from 3.0.8b2. Even though I
   am now including pdf files (acroread as parser) with no change
   to max_doc_size (assuming the default to still be 10K), my db files
   grew by a factor of 2.5 with the same hop count (-h 6). No big deal
   since I have plenty of disk space, but was just a little surprised.
   The size of the db files can *certainly* be related to the
   increased number of documents being indexed, which was also a
   little curious.

   Search results were really slow, until I added:
   backlink_factor: 0

   to htdig.conf

   WOW!!! What a difference; the trade-off with back linking is well
   worth it (IMHO).

   I was wondering (if anyone has really read this far) how do you handle
   upgrading ht://Dig? I have an upgrade path in mind, but it isn't
   pretty. Any thoughts on this?


To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Fri May 14 1999 - 06:00:33 PDT