Re: [htdig] Local digging files without extension


Subject: Re: [htdig] Local digging files without extension
From: Marcel Hicking (hicking@du.gtn.com)
Date: Wed Oct 18 2000 - 00:21:19 PDT


Peter L. Peres <plp@actcom.co.il> 17 Oct 2000, at 19:27:

> On Mon, 16 Oct 2000, Gilles Detillieux wrote:
>
> >According to Marcel Hicking:
> >> I'm trying to dig local files using a generated start_url
> >> list and local_url.
[...]
> >Yes, the problem is the missing extension. No, there's no
> >config attribute you can set to assign a default mime type
> >to files with no extension. Yes, there is a workaround,
> >which would be to change the RetrieveLocal() method in
> >htdig/Document.cc to handle this case and assign the type
> >you want. We don't do this in the distributed source
> >because there isn't universal agreement on what mime type
> >these files should have, and we haven't worked out a
> >better, more configurable scheme for this code yet.
>
> In other words, if I'd make a patch to allow extension-less
> files to be indexed (first remote, then maybe also local),
> then it would be rather welcome ? Especially since I need it
> myself ;-)

It surely will. Although I would prefer having first local
than remote files checked, at least if I use local_urls.
Fallback to remote checking is done anyway, and usually the
httpd delivers a mimetype even for files without
extensions.

I could imaging two different ways of solving the problem
in a more general way:

The first would be having an apache like mime.types, as
this is a well proven feature. Maybe call it mime.local and
add keywords for unkown/default types/extensions as well as
special mimetype to force htdig get the type via http.

Say:
text/css css
text/html html htm
text/plain txt
text/richtext rtf
text/plain (none)
(http) php3
(http) (other)

Since we have some nice regex parsing for config files why
not use them here as well and extend it to the full
filename?

A different idea would be to have a local_mimetype keyword
similar to local_urls:
local_mimetype txt=text/plain \
                htm*=text/html \
                php*=(http) \
                (none)=text/plain

Might be easier to use the existing config parser this way.

Just some quick thoughts accompanying my first coffee for
today...

Regards, Marcel

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Wed Oct 18 2000 - 00:26:31 PDT