Re: [htdig] Still no luck with indexing PDF's


Subject: Re: [htdig] Still no luck with indexing PDF's
From: Anthony Peacock (a.peacock@chime.ucl.ac.uk)
Date: Mon Feb 07 2000 - 07:07:32 PST


Hi,

> >> I am still having no luck geting my PDF files indexed.
> >>
> >> If anyone has suceded in doing this, I would love to be able to ask
> >> them some questiosn.
> >>
> >> Geting this working is fairly important, and I am having no luck at all with
> >> it. I have read all the information I can find in the FAQ on it, and seem to
> >> still be doing something wrong.
> >>
> >> Could someone take a minute or 2 for an mail exchange with me on this?
> >
> >How about describing your problem to the list?
>
> Can do. I just did not want to repeat what I posted over the weekend.
>
> I am running a small site that serves vendors documentation to an
> internal group. I use apache, and squid on a FreeBSD 3.4 STABLE
> machine.
>
> Up untill now, all the docs have been in HTML, and all has worked well.
> Now however I have a large documentation set which is pretty much all
> PDF files with just enouhg HTML pages to allow fo navigation.
>
> Thes docs consit of some pretty complex (I think) manuals. Which are
> for instance mutli page.
>
> I have tried indexing them using both acroread 3 (which results in no
> keys being found I beleive), and acroeread 4, whcih core dumps.
>
> The I downlaoded pares_doc.pl, and conv_doc.pl. I also downloaded and
> compiled the lates xpdf. Wehn I use a config file setting like:
>
> external_parsers: application/pdf /usr/local/doc.parse_doc.pl
>
> Then I get erros about "no Current point in closepath"
>
> when I put this in the config file:
>
> external_parsers: application/pdf->text/html /usr/local/bin/conv_doc.pl
>
> Then acroread gets valled anyway.

Can you try converting a single PDF file by running either Acroread or
pdf2text from the command line. And send the results to the list.

We first of all need to work out if the failure is due to running through
htdig/parse_doc.pl.

>
> >
> >There are plenty of people on this list that have PDF indexing working.
>
> Wonderful, I appreciate the help on this.
> >
> >Have you checked the list archives?
>
>
> To the best of my abilyt. I am a total newbie at this :-(

I seem to recall that somebody had problems with Acroread 4. I use pdf2txt
from xpdf-0.90, with no problem.

---
Fare Thee Well
Anthony Peacock       
CHIME, Royal Free & University College Medical School
WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
"The social dynamics of the net are a direct consequence of the fact that
nobody has yet developed a Remote Strangulation Protocol." --Larry Wall

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Feb 07 2000 - 07:05:17 PST