Re: htdig: (Not) translating entities


Marjolein Katsma (webmaster@javawoman.com)
Tue, 12 Jan 1999 19:00:54 +0100


Gilles,

At 11:20 1999-01-12 -0600, Gilles Detillieux wrote:
>Hi. I have a couple questions/comments about Marjolein's patch:
>
>> *************** ConfigDefaults defaults[] =
>> *** 168,173 ****
>> --- 182,190 ----
>> {"no_excerpt_show_top", "false"},
>> {"no_next_page_text", "[next]"},
>> {"no_prev_page_text", "[prev]"},
>> + {"no_title_text", "[No title]"},
>> //mk19990110
>> + {"noindex_start", "<!--htdig_noindex-->"},
>> //mk19990106
>> + {"noindex_end", "<!--/htdig_noindex-->"},
>> //mk19990106
>> {"nothing_found_file", "${common_dir}/nomatch.html"},
>> {"page_list_header", "<hr noshade size=2>Pages:<br>"},
>> {"prefix_match_character", "*"},
>
>You add these new attributes, but where are they used?

'no_title_text' is used in Display.cc ( post with changes for that will
follow shortly - I already was late for work this morning ;-) It allows
the text displayed when a document has no title to be run-time configurable
rather than hard-coded in the source. Useful for making the output speak
your visitor's language. I found this in the mail list archives; about a
year ago - can't remember the name of the poster. Can't think why it didn't
make it into the distribution but I found it useful.

'noindex_start' and 'noindex_end' are used in the patch for HTML.cc -
already posted; see message titled "Skipping parts of a document". It
allows filtering out parts of a document from being indexed.

>
>> *************** SGMLEntities::translateAndUpdate(unsigne
[snip]
>> !
>> ! if ( !config.Boolean("translate_quot") )
>> //mk19990111
>> ! {
>> //mk19990111
>> ! //
>> //mk19990111
>> ! // Do NOT translate entities for '"' (quote).
>> //mk19990111
>> ! //
>> //mk19990111
>> ! if (entity.compare(new String("quot")) == 0 ||
>> //mk19990111
>> ! entity.compare(new String("#34")) == 0 )
>> //mk19990111
>> ! {
>> //mk19990111
>> ! entityStart = orig + 1;
>> //mk19990111
>> ! return '&';
>> //mk19990111
>> ! }
>> //mk19990111
>> ! }
>> //mk19990111
[snip]

>
>Correct me if I'm wrong, but won't all these new String("blah") constructs
>lead to major memory leaks in htdig? I think a better way of comparing the
>entity String to a char * would be:
> if (strcmp(entity.get(), "blah") == 0)
>

Quite possible it causes a memory leak; I'm only a beginner at C++ and am
more used to languages like Smalltalk and Java which take care of their own
garbage collection.
No problem on my (so far) small site but if there is indeed a memory leak
it might cause problems on large sites...

However, I don't really think your solution is necessarily 'better'. A
better solution would be to extend the String class with a compare method
that accepts a "string" (a char *) as a parameter rather than another
String object. More object-oriented and more encapsulated. You could then
simply write:
        if ( entity.compare("lt") == 0 ) ....

>--
>Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca>
>Spinal Cord Research Centre WWW:
http://www.scrc.umanitoba.ca/~grdetil
>Dept. Physiology, U. of Manitoba Phone: (204)789-3766
>Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930

Marjolein Katsma webmaster@javawoman.com
Java Woman - http://javawoman.com/
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Wed Jan 13 1999 - 09:13:06 PST