Re: [htdig] Word documents indexing problem


Subject: Re: [htdig] Word documents indexing problem
From: D.J.Adams@soton.ac.uk
Date: Wed Jun 07 2000 - 04:51:40 PDT


>
> Hello,
>
> I use htdig 3.1.5 on linux Redhat 6.1.
>
> I have configured htdig.conf file as follows :
>
> valid_extensions: .html .htm .doc .pdf .txt
> local_default_doc: new_index.html index.html index.htm main.htm
> main_frame.htm frame.htm content.htm title.htm main2.htm
>
> local_urls_only: true
>
> local_urls: http://gnbuxsl.grenoble.hp.com:8090/=/var/opt/web/
>
> #
> # Since ht://Dig does not (and cannot) parse every document type, this
> # attribute is a list of strings (extensions) that will be ignored
> during
> # indexing. These are *only* checked at the end of a URL, whereas
> # exclude_url patterns are matched anywhere.
> #
> bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com
> .gif \
> .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg
> .mov .avi
>
> max_doc_size: 20000000
>
> external_parsers: application/msword->text/html
> /usr/local/bin/parse_doc.pl \
> application/postscript->text/html
> /usr/local/bin/parse_doc.pl \
> application/pdf->text/html /usr/local/bin/parse_doc.pl
>
> pdf files indexing works fine whereas I get the following message when
> indexing msword files :
>
> 30:30:2:http://gnbuxsl.grenoble.hp.com:8090/doc/tech/casc/details_casc.doc:
> Trying local files
> found existing file /var/opt/web/doc/tech/casc/details_casc.doc
> not found
>
> The file /var/opt/web/doc/tech/casc/details_casc.doc actually exists...
>
> I don't understand what the problem can be. Running rundig with several
> additional -v options does not help.
>
> Could somebody help me ?
>
> Thanks,
> Jean-Francois.
> --

I think the "not found" could refer to the utility which you are using
within parse_doc.pl to handle word documents.

Try calling parse_doc.pl from the command line:

        parse_doc.pl /var/opt/web/doc/tech/casc/details_casc.doc arg2 arg3

and see what happens.

-- 
 
David J Adams
<D.J.Adams@soton.ac.uk>
Computing Services
University of Southampton

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Jun 07 2000 - 02:41:25 PDT