[htdig] avoiding binary attachments when indexing email archives

Subject: [htdig] avoiding binary attachments when indexing email archives
From: Brett Dikeman (brett@artelsoft.com)
Date: Wed Feb 02 2000 - 09:03:15 PST

what's the best way to avoid attachments in archived email?
Otherwise, fuzzy searches end up including random "words" made up of
many random characters, drawn from what htdig considered "text"; I
can find it in emails people sent that included binary attachments.

Second, if htdig is on the same machine as the site I'm searching,
how do I avoid the overhead of using http to do the indexing? Giving
full path names in "start_url" broke the rundig script and trashed my
index files.

I tried searching the archives with several different keywords; came
up with lots of stuff, but nothing I wanted :-)



Brett Dikeman				Network/System Admin
Artel Software				617-451-9900
381 Congress Street				617-451-9916(fax)
Boston, MA 02210				http://www.borisfx.com

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.

This archive was generated by hypermail 2b28 : Wed Feb 02 2000 - 09:04:56 PST