Re: [htdig] htdig ignores *.doc file extension

Subject: Re: [htdig] htdig ignores *.doc file extension
From: Evelio Martinez (
Date: Mon Jan 15 2001 - 03:15:06 PST

Geoff Hutchison escribió:

> On Fri, 12 Jan 2001, Evelio Martinez wrote:
> > htdig is ignoring the files with pdf and doc extension.
> By this, I assume you mean they're not indexed.


> Try running htdig -vvv and take a look at what happens when it encounters
> a link to a PDF file. Does it reject the link? Or does it get to the link
> and try to index it later?

I have run bin/htdig -i -vvv -s | tee /tmp/ht and the 3 .doc and 2
.pdf files that are
under /home/httpd/html does not have any reference in the debug file

Is this normal?

> If it's the former, then one of your limits is set incorrectly. (e.g.
> bad_extensions, valid_extensions, exclude_urls, limit_urls_to ...)

I have not seen anything apparently wrong. Do you?
I attached the htdig.conf

> If it's the latter, then make sure you can run a .doc or a .pdf through
> the external converter itself and get reasonable-looking output.

If I execute /usr/local/bin/catdoc /home/httpd/html/*.doc I can see a
reasonable-looking output.

Any idea?


Evelio Martínez
Testanet. Dept. desarrollo software.
Av. Reino de Valencia, 15 - 5
46005 Valencia (Spain)
Tel: +34 96 395 90 00
Fax: +34 96 316 23 19


------------------------------------ To unsubscribe from the htdig mailing list, send a message to You will receive a message to confirm this. List archives: <> FAQ: <>

This archive was generated by hypermail 2b28 : Mon Jan 15 2001 - 03:29:46 PST