[htdig] Re: HTDIG: PDF output


Subject: [htdig] Re: HTDIG: PDF output
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Wed Jan 10 2001 - 16:47:32 PST


Hi, Mark. Please direct questions such as this to the htdig@htdig.org
mailing list.

According to Mark Gendron:
> My problem is title output when doing a search for X and getting a hits
> that are PDF files. The output is for the title. Note that it appears
> PDF titles (entered by me) are not being parsed.
>
> After reading through the htdig mailing list archives and then
> downloading both parse_doc. and doc2html files I thought that you might
> be able to help me with my query.
>
> Although I do not run the server or administer the search engine
> software I'm hoping you might be able to provide me with some info that
> I could pass along to a "very-overworked" systems admin person here. The
> end result would be to provide our website audience with descriptive
> titles when they search the site.

Hmm, that's odd. I've seen those two characters appear in PDF files before,
but usually only in the Producer field, not in the Title field. How did
you enter the PDF titles? Did you use the File->Document Info->General...
selection (or Ctrl-D) in Acrobat Exchange, or some other facility?

How are you parsing the PDFs in htdig? Are you using acroread,
through the pdf_parser attribute, or using parse_doc.pl, conv_doc.pl or
doc2html.pl through the external_parsers attribute. What is your current
setting of either of these attributes. If you're using an external
parser or converter, did you modify it in any way? What does its output
look like on one of these problem PDF files? What does pdfinfo's output
look like? What version of xpdf did you install to support the external
parser/converter?

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Wed Jan 10 2001 - 17:01:29 PST