[htdig3-dev] HEAD before GET to allow exclusion by MIME content-type


Subject: [htdig3-dev] HEAD before GET to allow exclusion by MIME content-type
From: Simon Pickup (simon@adacel.com.au)
Date: Tue Feb 01 2000 - 20:56:12 PST


We use htdig to index a remote site over a WAN, but that site serves a
number of large binary files, and we want to save bandwidth by
preventing these from being downloaded (as they cannot be indexed
anyway).

We are currently using "bad_extensions" to skip them by URL matching,
but it would be more reliable if we could skip them based on the MIME
type returned in the HTTP header. Of course to do this would mean
sending a HEAD request first, and then only a GET if the content-type is
one we can index. Of course this has a significant latency impact, but
we can live with that.

I was considering something like "valid_content_types" and
"bad_content_types" analagous to "valid_extensions" and
"bad_extensions". They would default to empty, resulting in the current
GET-only behaviour; if either is non-empty, the behaviour would be
HEAD-then-GET.

Has anybody considered this before? Any thoughts?

Presumably it would not be too difficult to implement, and I'd consider
writing a patch myself.

Regards,
Simon
______________________________________________________________________
Simon Pickup Adacel Technologies Limited
Senior Software Engineer ACN 079 672 281
simon@adacel.com.au 250 Bay Street
                                                         BRIGHTON 3186
                                                             Australia
        A D A C E L
                                                    t. +61 3 9596 2991
advancedsoftwaresolutions f. +61 3 9596 2960
        www.adacel.com.au

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Feb 01 2000 - 20:56:57 PST