[htdig] database probs

Gabriel Fenteany (fenteany@calvin.bwh.harvard.edu)
Fri, 25 Jun 1999 13:31:49 -0400

Hello. Again, I have to say htdig is the great, and when phrase searching
is available, it'll be insanely great. Question, though...

We're now indexing about 300 servers with db files of about 1/3-1/2 Gb each
now (except for the .index file, of course). A dig can take 48 hours.
We're indexing quite promiscuously: 2 word minimum, all 8859-1 characters,
max of 6 hop counts. None of this is a problem, except every once in a
while when I run htdig and htmerge, the resulting databases seem unreadable;
searches may result in either the wrong pages being fetched or no results at
all where there definitely should be. I always backup old db files, so when
this happens, I mv the old files back. A problem like this seems to occur
once in every four or five digs, and I can't explain why. Everything is run
the same way, and it can happen when I haven't touched the htdig.conf
between runs.

Does anyone have an idea about what might be going wrong occasionally?
Could there be some other executable on our server (DEC Alpha running their
4.0D version of Unix) that might interfere with digging or merging? I am
not the only one with root on our server; however, I have no evidence that
the other su inadertently interferes in any way with digging or merging.

If I re-merge after this occurs, the resulting db files are still unreadable
or misread. So I don't know if it's a problem with the merging process.
Again, because of the irregular appearance of such problems - only once
every few times even when htdig.conf is held constant - I cannot explain the

Any ideas/solutions would be appreciated.

Thanks very much!

Gabriel Fenteany
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Fri Jun 25 1999 - 09:47:05 PDT