[htdig] Hit count>0 but no URLs returned

Malcolm Austen (malcolm@sable.ox.ac.uk)
Fri, 17 Sep 1999 09:20:37 +0100 (BST)


I have lurked for a few weeks ... but may still be seen barking up the
wrong tree 8-)

My comment below might connect with several reports I've seen just
recently on the list.

On Sun, 12 Sep 1999, Sadhunathan Nadesan wrote:

+ i've been running htdig for about a year. it's always worked
+ flawlessly. but lately it is returning the wrong url's.

I have taken over a system (it's old running htdig 3.0.8b2, I'm slowly
building a new system with the latest release) and am not yet confident I
have got it all sussed! You can expect dumb questions from me in the
future ... 8-(


+ the symptoms are: when you do a search, all the url's returned are
+ nothing to do with your search! they are pages on the sites alright,
+ but don't really contain any of the keywords from the search. i have
+ tried re-indexing several times and get consistent, incorrect results.

I saw a very similar result on the system I was left with. It would return
hit counts > 0 but sometimes failed to deliver any links. Sometimes it
delivered links that were clearly wrong. Eventually I linked this to the
database growing every week (until it filled the disk!) ... in my case
htdig was being run with the -i flag but this only deleted and started
from scratch with the files that htdig built - htmerge was not starting
from scratch and was adding new words/references and leaving old
words/references to go bad. I fixed the problem by actively deleting the
contents of the database directory rather than asking htdig to

I have not looked to see if the problem has gone away with the latest
version (ok, I'm not even sure it is a problem with htdig, maybe my
predecessor just failed to set it up right). Since my brief is to re-index
from scratch once a week I just place a new empty directory and
subsequently rename the directories so I always have an old directory with
a backup database from the previous week in it.

