Re: [htdig] Stability of beta, and a couple newbie questions

Subject: Re: [htdig] Stability of beta, and a couple newbie questions
From: Gilles Detillieux (
Date: Fri May 19 2000 - 15:34:08 PDT

According to Joe Sanderson:
> I've been using the 3.2.0b2 beta for the last two weeks, and have found
> it to be quite stable. I need to make the decision about using this
> beta as a site search engine, or to use the latest 3.1 released
> version. I'm looking for input on the stability of this beta in order
> to make the best decision. I'd like to go with it, for the expression
> matching and improvements in match weighting. I've looked at the source
> code change comments, and the release notes, but is there a higher level
> feature list of "what's new in 3.2"?

No, I think those two files are all the what's new you'll find at
this point. As for the stability, your own testing should be your best
guide - just test it thoroughly, and of course report any bugs you find.
Just be aware, if you use it on a production system, that there may be
issues that come up that will force you to upgrade. We also can't yet
promise that the database format won't change, and indeed it may well
do so, requiring a reindexing after an upgrade.

> How far is the 3.2.0 version from being considered "release" code?
> I've built and will be using htdig on Linux only.

Good question! That depends a lot on how much testing it gets, and how
much time the developers have to put into it. The current STATUS file for
it lists a lot of outstanding issues to address, so I'd say it's still
many many months away. However, that doesn't mean you can't use it now
on a production system, if you've determined it works reliably in your
environment. Just be aware of the risks. The code has had more testing
on Linux than other systems, so that works in your favour.

> I also have a few other newbie questions:
> 1) I have not been using the incremental (update) index build in htdig,
> but have been building the index using the "-i" option each time. How
> robust is the incremental index build feature?

Only testing will tell, and this is an area that needs much testing, as
it's very different from 3.1.x.

> If a page that was
> referenced in the index is deleted, then I rebuild the index
> incrementally, will the stale reference still show up in the search
> results (provided there's a match)?

If it gets a 404, it should remove it from the database.

> How does the update work - does it
> check the date on all html files indexed, and only re-index files that
> have changed since the last index build?

Essentially. It remembers the Last-Modified time of each file, and if it's
different this time around, it reindexes. For dynamic content, it will
reindex each time.

> 2) (A related question) Does htsearch check for stale links (to pages
> that do not exist) in the results?

No. htsearch does not itself weed out stale links, so they show up
and you'll get a 404 if you click on the link.

> 3) If I use the -i option to htdig, and the databases already exist,
> does htdig do a complete rebuild of the index or does it just do an
> update?

I believe that with -i, it will actually remove the existing database
before starting.

