[htdig] Re: Parsing PDF Files Follow-up


Subject: [htdig] Re: Parsing PDF Files Follow-up
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Fri Jun 23 2000 - 16:14:53 PDT


At 10:01 AM -0400 6/23/00, Wayne Fool wrote:
>was combined and jumbled into one file. So I did a binary download, should
>have done that first, and everything works great.

Good to hear.

>I wanted to ask another question that I couldn't find the answer to in the
>archives or FAQ. Is it possible to have htdig search the key words line a
>PDF file's document info section and if so, what format does that line have
>to be in (comma delimited, space, etc.)? Thanks for your help. I
>appreciate it.

The short answer is "I don't know." As you're already aware, ht://Dig
does not handle PDF internally. (If someone wishes to write a *real*
PDF parser, this might be useful.) So the question boils down to
whether the keywords line is converted when you use a conversion
program. My suggestion is to look at the output of doc2html or
conv_doc.pl or parse_doc.pl or whatever and see if the keywords show
up.

As far as format, if it's actually converted, I'd make sure there are
at least spaces between words. Commas will be ignored, but may not
always be treated as word separators (e.g. word1,word2 might become
word1word2 depending on your configuration).

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Fri Jun 23 2000 - 13:47:36 PDT