Re: [htdig3-dev] feedback on ht://Dig documentation


Subject: Re: [htdig3-dev] feedback on ht://Dig documentation
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Thu Nov 18 1999 - 15:51:51 PST


At 5:41 PM -0500 11/18/99, Tom Metro wrote:
>There actually appears to be two sets of fall-back values:
>1. If not found by config, it gets set to /usr/local/bin/acroread.
>2. If explicitly set to nothing, the code tries to find it in the
>path.
>
>I'm guessing 2 was implemented before 1 came about, and it's been left
>there as something that is mostly harmless. Though see below...

This is correct. I implemented the configure detection since many
people *didn't* have it in the path of the user running htdig.

>I guess more importantly, is the second parameter useful?
>An experiment shows that it doesn't seem to be needed:

At one point, they actually had a man page (for acroread v. 3). That
implied the second parameter was not just useful, but necessary.

>I would say that because the Acrobat parser is not an integral part of
>the ht://Dig package, if not found by configure, it should be disabled
>by default. Someone installing a parser later can make the appropriate
>settings in htdig.conf to enable it, just as they would with any other
>parser.

A good point, I think.

> > printf("PDF::parse: cannot find pdf parser %s\n", arg0.get());
>BTW, is that going to STDOUT as it appears, rather than STDERR? Is
>that normal practice for htdig's error messages?

The error messages are somewhat inconsistent in this regard. It
should probably go to STDERR.

>BTW, if you have a bad_extensions directive, why add .cgi to
>exclude_urls?
>
> exclude_urls: /cgi-bin/ .cgi

Because most people don't think of .cgi as an extension. Or that
would be my guess.

>Also, the documentation for exclude_urls makes mention of "patterns",
>yet if I understand correctly (I haven't checked the code) it simply
>performs a (case sensitive?) sub-string match. To me, pattern implies
>the inclusion of wildcards or other meta characters.

The documentation is not necessarily perfect. As many people will
point out, developers are often not the best at writing documentation.

>The question that comes to mind is why is pdf_parser treated specially
>and not implemented via the generalized external parser interface?

Gilles can probably answer this more effectively than I, but at the
time of PDF.cc being contributed, acroread was essentially the only
reliable technique around for translating PDF to text. At this point,
xpdf is probably a better program (for a variety of reasons, some of
them license-related).

Of course having a builtin parser is almost always faster than an
external parser.

-Geoff

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev-unsubscribe@htdig.org
You'll receive a message confirming the unsubscription.



This archive was generated by hypermail 2b25 : Thu Nov 18 1999 - 16:05:39 PST