Re: [htdig] external_parsers: ignored


Subject: Re: [htdig] external_parsers: ignored
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Thu Sep 07 2000 - 10:38:54 PDT


According to Geoff Hutchison:
> On Thu, 7 Sep 2000, Klaus Gröger wrote:
> > external_parsers: "application/pdf; charset=iso-8859-1" "/usr/share/htdig/parse_doc.pl"
>
> Hmm. Does it work if you just have "application/pdf;"? I'm assuming the
> ExternalParser code thought that the semicolon was part of the MIME-Type.

The ExternalParser.cc code seems to be making assumptions about the
Content-Type header that it shouldn't. Elsewhere, this header value
is compared using mystrncmp(), which is case insensitive and doesn't
go beyond the length of the string to which it's compared, so any
extra stuff is ignored. In ExternalParser's constructor, canParse(),
and elsewhere it does Dictionary lookups using contentType as a key,
without first trimming and lowercasing this key, so any uppercase letters
or extra information will cause the match to fail.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Thu Sep 07 2000 - 10:40:54 PDT