Re: [htdig] Looking for start_url strategies

Subject: Re: [htdig] Looking for start_url strategies
From: Geoff Hutchison (
Date: Thu Nov 30 2000 - 19:51:45 PST

At 8:16 PM -0500 11/30/00, David Gewirtz wrote:
>So, first question: is it possible to REMOVE a site and it's
>associated URLs from a database without reinitializing?

There is no easy way of removing a URL (much less a site) in 3.1
without reindexing. That said, read on.

>One thought was to index each site at a time and check it out. But
>that'll take forever. Another thought was to index all the sites,
>but if one seems crappy, remove it from the start_url set, do an
>htdig -i, and clean out the database. But that'll require us to
>bring down the database for a re-index time and once the server goes
>live, that's not really acceptable.

You don't need to run with htdig -i. Personally, I usually do all my
runs with -a so that updates don't affect the "live" DB and so that I
have a backup in case things go south. (For an example, see the script, e.g. <>.)

If you're using htdig -a, then you can just remove the .work
databases before starting and it clearly will be reindexing from

(N.B. The -i flag just means that htdig deletes the databases before
going and the -a flag just appends a .work file before reading
anything in or starting.)

Does this sound like a slightly better solution?

-Geoff Hutchison
Williams Students Online

------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this. List archives: <> FAQ: <>

This archive was generated by hypermail 2b28 : Thu Nov 30 2000 - 20:22:31 PST