Re: [htdig] Problem with &..; entities in meta tags


Torsten Neuer (tneuer@inwise.de)
Fri, 30 Jul 1999 19:55:44 +0200


According to Gilles Detillieux:
>According to Lennart Almkvist:
>> Some more testing gave the following results:
>>
>> The german flower words "Stiefmütterchen" and the islandic
>> "þrenningarfjóla" are treated different in meta content
>> and in the body or title part of an html document.
>>
>> When in the body or in the title, the "ü", "þ" and "ó "
>> are decoded to a one byte character in the .wordlist and .words.db files.
>>
>> In meta content however, these words are decoded to "stiefmuuml;t"
>> and "thorn;rennin" in the .wordlist and .words.db file. That is the "&" is
>> removed and the rest is kept as letters ("&" is in valid_punctuation but
>> the ";" is not, by default).
>>
>> Should not they be decoded as the title or body is ?
>
>Here's a patch for 3.1.2 that should do what you want. Please give it a
>try and let us know if it fixes this bug.
[...]
> }
>

Something else is going wrong now..

Seems that you strip off one character after the entity, too
somewhere (not everywhere, but in most cases).

e.g. instead of "über" I'll get "üer"

cheers,
  Torsten

--
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstraße 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: info@inwise.de            Internet: http://www.inwise.de

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word unsubscribe in the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Fri Jul 30 1999 - 10:16:23 PDT