Re: htdig: Searching PDF or Word Files


Shyam B S (shyambs@hotmail.com)
Tue, 12 Jan 1999 04:37:20 PST


>At 6:30 AM -0400 1/11/99, Shyam B S wrote:
>>I am trying to index and Search MS Word and PDF files. I am using
catdoc
>>and acrobat as the external parsers for these documents. htsearch
finds
>
>Do you mean that you've specified catdoc and acrobat in the external
parser
>attribute? If so, it's not going to work reliably (if at all). The
external
>parser support expects output to follow certain guidelines documented
in
>http://www.htdig.org/attrs.html#external_parser so you can't just plug
any
>program in.
>
>If you're running any of the 3.1.0bX series, they include a PDF parser
that
>works with acrobat (and should work out of the box). More recent betas
>include scripts to handle Word documents using catdoc.
>

Thanks. I am using htparsedoc as the external parser which calls catdoc
for word docs. I could solve the problem, by modifying the htparsedoc to
return record type h along with record types title(t) and words(w).

Shyam

Shyam

______________________________________________________
Get Your Private, Free Email at http://www.hotmail.com
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Wed Jan 13 1999 - 09:13:05 PST