Re: htdig: (Not) translating entities

Gilles Detillieux (
Tue, 12 Jan 1999 14:59:57 -0600 (CST)

According to Marjolein Katsma:
> At 13:49 1999-01-12 -0500, you wrote:
> >I picked a patch that supplies the filename if no title is found. I
> >thought this was more appropriate. As always, I'm open to comments. May
> >the best patch win. ;-)
> Not a bad solution either - but again not configurable. I like things to be
> configurable (as you must have noticed by now ;-)).

Especially where English text is involved! Quite understandable, and
I think finding these areas is a very worthwhile goal.

> How about this idea:
> - keep my configuration parameter
> - change the *interpretation* so that if the parameter is set to "filename"
> (as a literal string and default) it will substitute the file name; for any
> other text it will use that text instead (somewhat similar as is used now
> for format--template matching in template_map).

Works for me.

> >> However, I don't really think your solution is necessarily 'better'. A
> >> better solution would be to extend the String class with a compare method
> >> that accepts a "string" (a char *) as a parameter rather than another
> >> String object. More object-oriented and more encapsulated. You could then
> >> simply write:
> >> if ("lt") == 0 ) ....
> >
> >Yes, this was the suggestion I was going to make. The "new String"
> >portions of your patch are definitely memory leaks (albeit small ones).
> I'll look into doing that but I'm getting pressed for time myself by now.
> So far I've concentrated on making things work rather than writing 'clean'
> code... (I prefer the latter, of course, but I do have priorities that
> pevent me from doing this right now).

I hear you! I've been feeling guilty about the amount of time I've been
spending working on ht://Dig, when I should be updating my own documentation

> One thing I cannot estimate is just how large / small the memory leak would
> be in this case and when it would become an actual problem (surely all
> memory is released when the program terminates?)

Here's a quick, back-of-the-envelope calculation. If you assume 50,000
documents, which an average of 5 &foo; entities per document, you've got
a total of 250,000 entities per dig. Each entity could result in up to
8 string comparisons, and therefore up to 8 new strings. Let's assume
a minimum of 8 bytes allocated per string for Data, and 12 bytes for
the String structure, so a total of 8 x 20 bytes wasted per entity, or
about 40 MB wasted per dig. I may be off by a lot, and the numbers would
vary greatly depending on circumstances, but I think it's a significant
enough leak that it's worth addressing, especially if the patch is to
go in 3.1.0.

Yes, the memory is released when htdig quits, but if the dig lasts
a while, the virtual memory it's wasting is tied up all that time.
The wasted memory will also lead to more paging, slowing down the dig.

If you want a quick fix, using strcmp is an option, but I think using
Andrew's idea of StringMatch objects would be pretty simple too, if
you can find a good place to initialise them.

Gilles R. Detillieux              E-mail: <>
Spinal Cord Research Centre       WWW:
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the body of the message.

This archive was generated by hypermail 2.0b3 on Wed Jan 13 1999 - 09:13:06 PST