Re: [htdig] 3.1.5: Completed large index


Subject: Re: [htdig] 3.1.5: Completed large index
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Sun Oct 15 2000 - 19:51:01 PDT


On Tue, 17 Oct 2000, Peter L. Peres wrote:

> In the end, it was not a bug. The database size is almost 400 MB, double
> from what I had done before, but the work time was about 8 times longer
> than before. I assume that having much more RAM is better.

Sure. If your process is bigger than your installed RAM, then it's going
to take a while as it swaps frequently.

> I am very happy that this has worked out in the end ;-) Next, I'll likely
> write a patch to allow quoted strings (esp. the null quoted string) in the
> config file. The current dig missed all the README LICENSE etc files
> because of this missing feature (imho).

The config file has accepted quoted strings for a *long* time. I'd bet
your config file already has quoted strings in it... I'm not entirely sure
I'd like the idea of a null string being accepted as a config value, but
feedback is welcome.

As I mentioned earlier (maybe you didn't get my
message?) it's not just an issue of the configuration code. The code in
Retriever.cc that matches extensions would need a special case for "no
extension" or it would still try to exclude README because it's not
"README."

> PS: Has anyone seen Adobe PDF files (from Adobe, f.ex. BDF font
> specification documents) which cause strange acroreader (and htdig)
> problems ? I have a few and I am not glad about this.

Oh tons, esp. if you're using acroread v. 4. If you'll see the FAQ you can
see that we do not recommend using acroread unless your documents cannot
be parsed at all by xpdf.

<http://www.htdig.org/FAQ.html#q4.9>

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Sun Oct 15 2000 - 19:56:19 PDT