Re: [htdig] htdig ignores *.doc file extension


Subject: Re: [htdig] htdig ignores *.doc file extension
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Fri Jan 12 2001 - 13:49:25 PST


On Fri, 12 Jan 2001, Evelio Martinez wrote:

> htdig is ignoring the files with pdf and doc extension.

By this, I assume you mean they're not indexed.

Try running htdig -vvv and take a look at what happens when it encounters
a link to a PDF file. Does it reject the link? Or does it get to the link
and try to index it later?

If it's the former, then one of your limits is set incorrectly. (e.g.
bad_extensions, valid_extensions, exclude_urls, limit_urls_to ...)

If it's the latter, then make sure you can run a .doc or a .pdf through
the external converter itself and get reasonable-looking output.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Fri Jan 12 2001 - 14:03:43 PST