Subject: Re: [htdig] URL deletions and maintenance.
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Tue Jul 11 2000 - 09:18:40 PDT
According to John Dispirito:
> My question is how do I remove a website once I no longer wish to store its
> pages in my search database?
> Is this possible? Or do I need to completely rebuild the database without
> running an htdig -a?
> (after I remove them from the start_url option in htdig.conf)
>
> It seems as if the htdig databases are almost one way, you can only put
> data in, but it can't be deleted as of yet.
> Am I correct?
Essentially, yes. htdig will tag pages for deletion when it gets a
404 error for them or the host address of the server no longer exists,
and htmerge will actually delete these pages if the remove_bad_urls
attribute is true (which it is by default). However, apart from that
there's no easy way of instructing it to delete individual pages, or
all the pages for a particular site. The 3.2 release will include a
utility for doing just this.
For 3.1.x, the best way to clean up your database is to reindex from
scratch using htdig -i. You can still run this with -a if you want,
and move the .work files over top of the non-.work files afterward.
-- Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Tue Jul 11 2000 - 06:34:25 PDT