Re: [htdig] Looking for start_url strategies


Subject: Re: [htdig] Looking for start_url strategies
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Thu Nov 30 2000 - 19:51:45 PST


At 8:16 PM -0500 11/30/00, David Gewirtz wrote:
>So, first question: is it possible to REMOVE a site and it's
>associated URLs from a database without reinitializing?

There is no easy way of removing a URL (much less a site) in 3.1
without reindexing. That said, read on.

>One thought was to index each site at a time and check it out. But
>that'll take forever. Another thought was to index all the sites,
>but if one seems crappy, remove it from the start_url set, do an
>htdig -i, and clean out the database. But that'll require us to
>bring down the database for a re-index time and once the server goes
>live, that's not really acceptable.

You don't need to run with htdig -i. Personally, I usually do all my
runs with -a so that updates don't affect the "live" DB and so that I
have a backup in case things go south. (For an example, see the
rundig.sh script, e.g. <http://www.htdig.org/files/contrib/scripts/>.)

If you're using htdig -a, then you can just remove the .work
databases before starting and it clearly will be reindexing from
scratch.

(N.B. The -i flag just means that htdig deletes the databases before
going and the -a flag just appends a .work file before reading
anything in or starting.)

Does this sound like a slightly better solution?

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Thu Nov 30 2000 - 20:22:31 PST