htdig: updating / reindexing and other questions


Jerry Preeper (preeper@cts.com)
Fri, 18 Sep 1998 13:15:11 -0500


Thanks for a great product, I am just getting started with it but so far so
good. I am running htdig 3.0.8b2 on FreeBSD 2.2.6 and Apache and it
installed with no problems whatsoever on the first shot.

I have a couple of quick questions:

1) I have used the rundig script to create the initial database and
everything seemed to work fine. I ran it again several days later to
reindex the site, this time, however, I changed the default htdig line from
the -i option to -a so people could use the search while htdig was running.
 However, I know of a page I added to the site between the two times I ran
rundig, and it is not being indexed. I have searched for several different
words on the page and it never comes up. I noticed in the db directory
that there were now two new files with a .work extension (db.docdb and
db.wordlist) so I saved a backup of the main files and then copied over the
.work files to the main files. This didn't show me the page I added
recently either. Any ideas?

Also, do I need to modify the script to copy the .work files over the
current files (or remove the .work files)? Which of these is the real new
one to be used? I want to work out all these minor glitches before I put
rundig in a cron job to run every night.

2) Does anyone know if there is a way (without using the endings) to prefix
match or match parts of words without slowing down the searching too much?
When I add endings, searches for something like New York Rangers comes up
with some really weird results :)

3) I know that htdig indexes the pages through a http process and I was
wondering if there was another alternative to this. My real desire is not
to include the indexing of the site in the server access logs so I don't
have to keep excluding all of this when I run weekly reports on these.
Also, I would like the search not to execute ssi (ie, for my banner ads -
don't want alt text on page or increased counts to ad banners). Any ideas?

4) Is there a way to index only changes to pages and new pages (ie, use
date of file and run against files dated later than database date) so that
it doesn't have to run the whole site every night?

Thanks for everything and I hope someone can help.

Jerry Preeper
preeper@cts.com

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:27:50 PST