Re: htdig: updating / reindexing and other questions

Geoff Hutchison (
Fri, 18 Sep 1998 15:10:44 -0400

>.work files to the main files. This didn't show me the page I added
>recently either. Any ideas?

Is the page linked from another page already in the database? If there
isn't a link to the page, ht://Dig can't find it.

>Also, do I need to modify the script to copy the .work files over the
>current files (or remove the .work files)? Which of these is the real new

You need to move (or copy) the .work files. The new files (as you can tell
with ls -l) are the .work files.

>2) Does anyone know if there is a way (without using the endings) to prefix
>match or match parts of words without slowing down the searching too much?

Define "too much." There are substring and prefix fuzzy matching
algorithms. Try them out and see if they're fast enough for you.

>3) I know that htdig indexes the pages through a http process and I was
>wondering if there was another alternative to this. My real desire is not

If you're running htdig on the server, you can upgrade to 3.1.0b1 and use
the local_urls directives to index through the filesystem. This will also
ignore SSI directives.

>4) Is there a way to index only changes to pages and new pages (ie, use
>date of file and run against files dated later than database date) so that
>it doesn't have to run the whole site every night?

Yup. If you don't run with "-i" then you will use the original databases.
But then if you also use "-a" you'll want to *copy* the .work files instead
of moving them. This way you'll have the dates around in the .work files
for htdig to use.

-Geoff Hutchison
Williams Students Online

