Re: [htdig] Search results


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Wed, 25 Aug 1999 10:28:22 -0500 (CDT)


According to peter karlsson:
> > In any case, it seems it would be a good idea to make htdig remember the
> > content-type of previously indexed documents, and use it by default.
>
> Or perhaps to assume that pages without a Content-Type are text/html?

So if you re-index, say, a PDF, and phttp assumes you don't need to
be told the content-type again, then you assume the PDF is HTML and
attempt to parse it as such? Not a good idea. I'd say, rather, that
if the Content-type is missing, you should base your assumptions of
the type on either the URL suffix, or what the beginning of the file
looks like. That means we're back to the proposal of adding mime.types
and/or mime-magic processing - this has been suggested before, to extend
local_urls processing, and handle other transport types, but takers for
implementation haven't been forthcoming.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word unsubscribe in the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Wed Aug 25 1999 - 08:30:17 PDT