[htdig] update dig problems


Subject: [htdig] update dig problems
From: Adam Rice (adam@newsquest.co.uk)
Date: Fri Sep 01 2000 - 04:26:55 PDT


I'm having a problem where my search results are out-of-date with
respect to the site, even though htdig is definitely running, and
definitely fetching the files from the web server, and not giving
errors. Perhaps I am misunderstanding what an update dig does? I thought
that it checked every document in its database, and rescanned it if it
was new, as well as following any links to new documents, and removing
it if it gets a 404.

I run htdig and htmerge with the -a commandline options. I then move the
*.docdb.work, *.docs.index.work and *.words.db.work files to *.docdb,
*.docs.index.work and *.words.db respectively. I don't actually use
wildcards, the *s are just there because I have different databases for
different sites. I then copy the *.docdb file back to *.docdb.work so
that it is there for the next update dig. The *.wordlist.work file is
left alone ready for the next update.

Does that procedure sound correct? All the pages on the sites use
server-side includes, and hence don't have Last-Modified: headers, could
that be confusing matters?

I have been running tail -f on *.wordlist.work while htdig is running,
and it just seems to be adding lines like

+707
+494
+689
+495
+709
+478
+1072
+504

rather than any new words. This seems odd to me, but then I never did
this before. After htmerge is finished those lines aren't there any
more.

I'm running htdig 3.1.5, compiled with gcc 2.8.1 on Solaris 2.6. I've
attached one of my config files, with the comments removed to save
space.

Adam Rice


lancashire.conf

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Fri Sep 01 2000 - 04:36:31 PDT