Re: [htdig] htdig ignores *.doc file extension


Subject: Re: [htdig] htdig ignores *.doc file extension
From: Evelio Martinez (evelio.martinez@testanet.com)
Date: Mon Jan 15 2001 - 03:15:06 PST


Geoff Hutchison escribió:

> On Fri, 12 Jan 2001, Evelio Martinez wrote:
>
> > htdig is ignoring the files with pdf and doc extension.
>
> By this, I assume you mean they're not indexed.

Correct.

>
>
> Try running htdig -vvv and take a look at what happens when it encounters
> a link to a PDF file. Does it reject the link? Or does it get to the link
> and try to index it later?

I have run bin/htdig -i -vvv -s | tee /tmp/ht and the 3 .doc and 2
.pdf files that are
under /home/httpd/html does not have any reference in the debug file
/tmp/ht.

Is this normal?

>
>
> If it's the former, then one of your limits is set incorrectly. (e.g.
> bad_extensions, valid_extensions, exclude_urls, limit_urls_to ...)

I have not seen anything apparently wrong. Do you?
I attached the htdig.conf

>
>
> If it's the latter, then make sure you can run a .doc or a .pdf through
> the external converter itself and get reasonable-looking output.

If I execute /usr/local/bin/catdoc /home/httpd/html/*.doc I can see a
reasonable-looking output.

Any idea?

Thanks

--
Evelio Martínez
Testanet. Dept. desarrollo software.
Av. Reino de Valencia, 15 - 5
46005 Valencia (Spain)
Tel: +34 96 395 90 00
Fax: +34 96 316 23 19


htdig.conf

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Mon Jan 15 2001 - 03:29:46 PST