Subject: [htdig] removing docs from the database when merging?
From: John Caldwell (jcald@veedub.nu)
Date: Tue Jan 18 2000 - 15:16:00 PST

I've got a whole bunch of stuff that needs to be indexed every week, with
new documents added that week being added to the index. Since all of the
pages are dynamic, if-modified-since headers don't work at all. My
solution to this is to dig all of the added pages seperately and then
merge the new db that is created into the old one.

On occasion there may be a page removed, and I figured the logical way to
remove the page from the db would be to have the spider get a 404 when it
went to the specified page. I tried this with a few of the pages in the
database, and when merging it notes that it wasn't found, but doesn't
actually remove it from the main db. Is there any way to do this? Since
the number of documents could potentially get quite large (about 250-500
added per day) I sure would hate to have to reindex the whole thing!

John Caldwell

