Re: htdig: (Not) translating entities


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Tue, 12 Jan 1999 14:31:39 -0600 (CST)


According to Marjolein Katsma:
> 'noindex_start' and 'noindex_end' are used in the patch for HTML.cc -
> already posted; see message titled "Skipping parts of a document". It
> allows filtering out parts of a document from being indexed.

That's what happens when I reply before catching up on reading posts to
the list. Sorry. I second Andrew's comment: good idea!

> >Correct me if I'm wrong, but won't all these new String("blah") constructs
> >lead to major memory leaks in htdig? I think a better way of comparing the
> >entity String to a char * would be:
> > if (strcmp(entity.get(), "blah") == 0)
> >
>
> Quite possible it causes a memory leak; I'm only a beginner at C++ and am
> more used to languages like Smalltalk and Java which take care of their own
> garbage collection.
> No problem on my (so far) small site but if there is indeed a memory leak
> it might cause problems on large sites...

I'm just a beginner at C++ too, but I'm surprised at how much I learned
just by diving into the code, and following the discussion on this list.
Geoff pointed out to me maybe a month ago that C++ doesn't do garbage
collection, and so I've been keeping an eye out for potential leaks since
then. Every "new" operator should at some point be followed up with a
"delete" operator, unless you intend for the object to stick around.
Sometimes it can be subtle though: there was a nasty memory leak in
htsearch that wasn't obvious until you dug into the code and realised that
docDB["url"] actually calls a function that returns a new DocumentRef
object, which must be deleted.

> However, I don't really think your solution is necessarily 'better'. A
> better solution would be to extend the String class with a compare method
> that accepts a "string" (a char *) as a parameter rather than another
> String object. More object-oriented and more encapsulated. You could then
> simply write:
> if ( entity.compare("lt") == 0 ) ....

Yes, that would be a good idea. You could even define the "==" operator
for comparing a String to a char *, so you could write: if (entity == "lt") ...

However, I like Andrew's idea of using the StringMatch class. You'd need
one StringMatch object for each of the three options you add, and it could
be initialised with one statement, e.g.:

        StringMatch match_lt_gt;
        match_lt_gt.pattern("lt|#60|gt|#62");

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Wed Jan 13 1999 - 09:13:06 PST