Re: [htdig] Larger db files with 3.1.2; pdf; importance of

Geoff Hutchison (
Fri, 14 May 1999 09:27:23 -0400 (EDT)

On Fri, 14 May 1999, Albert Desimone jr wrote:

> grew by a factor of 2.5 with the same hop count (-h 6). No big deal
> since I have plenty of disk space, but was just a little surprised.
> The size of the db files can *certainly* be related to the
> increased number of documents being indexed, which was also a
> little curious.

If you are indexing the same number of documents, the databases in 3.1.x
are actually smaller. However, you are also now indexing PDF files
(usually longer than HTML files on average), so you should expect larger
databases. Furthermore, 3.1.x stores the META description as well as the
excerpt, so if you have some of these in your documents you can also
expect some increase in size.

Finally you note that more documents are indexed. For one, you now have
PDF files indexed, right? :-) For another, 3.1.x has fixed several bugs in
the limiting code that would result in some files being rejected for no
obvious reason.

> WOW!!! What a difference; the trade-off with back linking is well
> worth it (IMHO).

That's your preference. Personally, I see a significant improvement in
search accuracy with backlink weighting. Since there isn't a perceived
slowdown on my system, I use it. It's slightly slower on the watch, but
all of about 0.4 seconds. But it all depends on what you're using to
return searches and the size of your databases.

> I was wondering (if anyone has really read this far) how do you handle
> upgrading ht://Dig? I have an upgrade path in mind, but it isn't
> pretty. Any thoughts on this?

I'm not exactly sure what you mean. My upgrade path consisted of indexing
new databases with 3.1.x while the old databases were still around. Then I
move in the new databases and the new CGI. There was a short time when the
CGI was broken (when the old CGI and the new databases were present), but
it was all of 10 minutes.

-Geoff Hutchison
Williams Students Online

To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Fri May 14 1999 - 06:42:59 PDT