Subject: [htdig] Word documents indexing problem
From: Jean-Francois Le Carre Petit (jean-francois_le-carre-petit@hp-france-om18.om.hp.com)
Date: Wed Jun 07 2000 - 02:56:29 PDT


I use htdig 3.1.5 on linux Redhat 6.1.

I have configured htdig.conf file as follows :

valid_extensions: .html .htm .doc .pdf .txt
local_default_doc: new_index.html index.html index.htm main.htm
main_frame.htm frame.htm content.htm title.htm main2.htm

local_urls_only: true

local_urls: http://gnbuxsl.grenoble.hp.com:8090/=/var/opt/web/

# Since ht://Dig does not (and cannot) parse every document type, this
# attribute is a list of strings (extensions) that will be ignored
# indexing. These are *only* checked at the end of a URL, whereas
# exclude_url patterns are matched anywhere.
bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com
.gif \
                .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg
.mov .avi

max_doc_size: 20000000

external_parsers: application/msword->text/html
/usr/local/bin/parse_doc.pl \
/usr/local/bin/parse_doc.pl \
                  application/pdf->text/html /usr/local/bin/parse_doc.pl

pdf files indexing works fine whereas I get the following message when
indexing msword files :
Trying local files
  found existing file /var/opt/web/doc/tech/casc/details_casc.doc
 not found

The file /var/opt/web/doc/tech/casc/details_casc.doc actually exists...

I don't understand what the problem can be. Running rundig with several
additional -v options does not help.

Could somebody help me ?



