Re: [htdig] URL deletions and maintenance.


Subject: Re: [htdig] URL deletions and maintenance.
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Tue Jul 11 2000 - 09:18:40 PDT


According to John Dispirito:
> My question is how do I remove a website once I no longer wish to store its
> pages in my search database?
> Is this possible? Or do I need to completely rebuild the database without
> running an htdig -a?
> (after I remove them from the start_url option in htdig.conf)
>
> It seems as if the htdig databases are almost one way, you can only put
> data in, but it can't be deleted as of yet.
> Am I correct?

Essentially, yes. htdig will tag pages for deletion when it gets a
404 error for them or the host address of the server no longer exists,
and htmerge will actually delete these pages if the remove_bad_urls
attribute is true (which it is by default). However, apart from that
there's no easy way of instructing it to delete individual pages, or
all the pages for a particular site. The 3.2 release will include a
utility for doing just this.

For 3.1.x, the best way to clean up your database is to reindex from
scratch using htdig -i. You can still run this with -a if you want,
and move the .work files over top of the non-.work files afterward.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Jul 11 2000 - 06:34:25 PDT