Subject: Re: [htdig] htdig ignores *.doc file extension
From: Evelio Martinez (evelio.martinez@testanet.com)
Date: Mon Jan 15 2001 - 03:15:06 PST
Geoff Hutchison escribió:
> On Fri, 12 Jan 2001, Evelio Martinez wrote:
>
> > htdig is ignoring the files with pdf and doc extension.
>
> By this, I assume you mean they're not indexed.
Correct.
>
>
> Try running htdig -vvv and take a look at what happens when it encounters
> a link to a PDF file. Does it reject the link? Or does it get to the link
> and try to index it later?
I have run bin/htdig -i -vvv -s | tee /tmp/ht and the 3 .doc and 2
.pdf files that are
under /home/httpd/html does not have any reference in the debug file
/tmp/ht.
Is this normal?
>
>
> If it's the former, then one of your limits is set incorrectly. (e.g.
> bad_extensions, valid_extensions, exclude_urls, limit_urls_to ...)
I have not seen anything apparently wrong. Do you?
I attached the htdig.conf
>
>
> If it's the latter, then make sure you can run a .doc or a .pdf through
> the external converter itself and get reasonable-looking output.
If I execute /usr/local/bin/catdoc /home/httpd/html/*.doc I can see a
reasonable-looking output.
Any idea?
Thanks
-- Evelio Martínez Testanet. Dept. desarrollo software. Av. Reino de Valencia, 15 - 5 46005 Valencia (Spain) Tel: +34 96 395 90 00 Fax: +34 96 316 23 19
------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
This archive was generated by hypermail 2b28 : Mon Jan 15 2001 - 03:29:46 PST