Re: [htdig] Problem with &..; entities in meta tags


Lennart Almkvist (la@nrm.se)
Fri, 30 Jul 1999 14:35:58 +0200


Some more testing gave the following results:

The german flower words "Stiefmütterchen" and the islandic
"þrenningarfjóla" are treated different in meta content
and in the body or title part of an html document.

When in the body or in the title, the "ü", "þ" and "ó "
are decoded to a one byte character in the .wordlist and .words.db files.

In meta content however, these words are decoded to "stiefmuuml;t"
and "thorn;rennin" in the .wordlist and .words.db file. That is the "&" is
removed and the rest is kept as letters ("&" is in valid_punctuation but
the ";" is not, by default).

Should not they be decoded as the title or body is ?

Lennart Almvist
Museum of Natural History, Stockholm

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word unsubscribe in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Jul 30 1999 - 04:51:54 PDT