Subject: [htdig] Word documents indexing problem
From: Jean-Francois Le Carre Petit (jean-francois_le-carre-petit@hp-france-om18.om.hp.com)
Date: Wed Jun 07 2000 - 02:56:29 PDT
Hello,
I use htdig 3.1.5 on linux Redhat 6.1.
I have configured htdig.conf file as follows :
valid_extensions: .html .htm .doc .pdf .txt
local_default_doc: new_index.html index.html index.htm main.htm
main_frame.htm frame.htm content.htm title.htm main2.htm
local_urls_only: true
local_urls: http://gnbuxsl.grenoble.hp.com:8090/=/var/opt/web/
#
# Since ht://Dig does not (and cannot) parse every document type, this
# attribute is a list of strings (extensions) that will be ignored
during
# indexing. These are *only* checked at the end of a URL, whereas
# exclude_url patterns are matched anywhere.
#
bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com
.gif \
.jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg
.mov .avi
max_doc_size: 20000000
external_parsers: application/msword->text/html
/usr/local/bin/parse_doc.pl \
application/postscript->text/html
/usr/local/bin/parse_doc.pl \
application/pdf->text/html /usr/local/bin/parse_doc.pl
pdf files indexing works fine whereas I get the following message when
indexing msword files :
30:30:2:http://gnbuxsl.grenoble.hp.com:8090/doc/tech/casc/details_casc.doc:
Trying local files
found existing file /var/opt/web/doc/tech/casc/details_casc.doc
not found
The file /var/opt/web/doc/tech/casc/details_casc.doc actually exists...
I don't understand what the problem can be. Running rundig with several
additional -v options does not help.
Could somebody help me ?
Thanks,
Jean-Francois.
-------------------------------------- To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Wed Jun 07 2000 - 01:54:23 PDT