Subject: [htdig] update dig problems
From: Adam Rice (adam@newsquest.co.uk)
Date: Fri Sep 01 2000 - 04:26:55 PDT

I'm having a problem where my search results are out-of-date with
respect to the site, even though htdig is definitely running, and
definitely fetching the files from the web server, and not giving
errors. Perhaps I am misunderstanding what an update dig does? I thought
that it checked every document in its database, and rescanned it if it
was new, as well as following any links to new documents, and removing
it if it gets a 404.

I run htdig and htmerge with the -a commandline options. I then move the
*.docdb.work, *.docs.index.work and *.words.db.work files to *.docdb,
*.docs.index.work and *.words.db respectively. I don't actually use
wildcards, the *s are just there because I have different databases for
different sites. I then copy the *.docdb file back to *.docdb.work so
that it is there for the next update dig. The *.wordlist.work file is
left alone ready for the next update.

Does that procedure sound correct? All the pages on the sites use
server-side includes, and hence don't have Last-Modified: headers, could
that be confusing matters?

I have been running tail -f on *.wordlist.work while htdig is running,
and it just seems to be adding lines like


rather than any new words. This seems odd to me, but then I never did
this before. After htmerge is finished those lines aren't there any

I'm running htdig 3.1.5, compiled with gcc 2.8.1 on Solaris 2.6. I've
attached one of my config files, with the comments removed to save

Adam Rice


