Subject: Re: [htdig] Looking for start_url strategies
From: Geoff Hutchison (email@example.com)
Date: Thu Nov 30 2000 - 19:51:45 PST
At 8:16 PM -0500 11/30/00, David Gewirtz wrote:
>So, first question: is it possible to REMOVE a site and it's
>associated URLs from a database without reinitializing?
There is no easy way of removing a URL (much less a site) in 3.1
without reindexing. That said, read on.
>One thought was to index each site at a time and check it out. But
>that'll take forever. Another thought was to index all the sites,
>but if one seems crappy, remove it from the start_url set, do an
>htdig -i, and clean out the database. But that'll require us to
>bring down the database for a re-index time and once the server goes
>live, that's not really acceptable.
You don't need to run with htdig -i. Personally, I usually do all my
runs with -a so that updates don't affect the "live" DB and so that I
have a backup in case things go south. (For an example, see the
rundig.sh script, e.g. <http://www.htdig.org/files/contrib/scripts/>.)
If you're using htdig -a, then you can just remove the .work
databases before starting and it clearly will be reindexing from
(N.B. The -i flag just means that htdig deletes the databases before
going and the -a flag just appends a .work file before reading
anything in or starting.)
Does this sound like a slightly better solution?
-- -Geoff Hutchison Williams Students Online http://wso.williams.edu/
------------------------------------ To unsubscribe from the htdig mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
This archive was generated by hypermail 2b28 : Thu Nov 30 2000 - 20:22:31 PST