[htdig] htdig database related questions

Subject: [htdig] htdig database related questions
From: Haeberlen@RUS.Uni-Stuttgart.DE
Date: Wed Dec 06 2000 - 00:35:58 PST


is there a way of "editing" the htdig documents database after the
dig is finished ? We tried the BerkeleyDB tools that are included
in the htdig distribution but e.g. db_dump refuses to do anything
with any of the database files. The error messages always look like

db_dump: database_file: page X doesn't exist, create flag not set
db_dump: dbp->stat: I/O error

Is there anything wrong with our db files? htsearch seems to be able
to use them, though. Am I missing something?

Why do I want to "edit" the db files at all? The reason is that we have
a large database with quite a number of things we'd like to exclude
from the search results. The obvious solution would be to exclude them
from the dig in the first place. But I don't consider this possible
because a) this would make the config quite bulky and b) it would be
desirable to be able to delete certain things from the database between
the regular digs without having to run a "full update" for each newly
discovered "exclude candidate".

Any suggestions? Many thanks in advance.



PS: How does htdig handle the case where a document is in the docs database
but the corresponding URL is added to the exclude list? Will the document
be deleted from the db on the next update run, or would I have to delete the
db and run a "full index" again?

Thomas Haeberlen
Email: haeberlen@rus.uni-stuttgart.de

