Torsten Neuer (email@example.com)
Fri, 30 Jul 1999 18:46:29 +0200
According to Gilles Detillieux:
>> Lennart Almkvist wrote:
>> > Some more testing gave the following results:
>> > The german flower words "Stiefmütterchen" and the islandic
>> > "þrenningarfjóla" are treated different in meta content
>> > and in the body or title part of an html document.
>> > When in the body or in the title, the "ü", "þ" and "ó "
>> > are decoded to a one byte character in the .wordlist and .words.db files.
>> > In meta content however, these words are decoded to "stiefmuuml;t"
>> > and "thorn;rennin" in the .wordlist and .words.db file. That is the "&" is
>> > removed and the rest is kept as letters ("&" is in valid_punctuation but
>> > the ";" is not, by default).
>> > Should not they be decoded as the title or body is ?
>OK, we do clearly have a problem with SGML entities in 3.1.2, as well
>as 3.2. (3.2 has some more serious problems, which I was hoping to
>tackle, but that's another story.) So, right now, it only translates
>&foo; entities outside of any HTML tags. I think there are reasons
>not to translate them in all tags, but where is it valid to do so?
>Certainly in keywords text, alt text in img tags, and meta description
>text. How about htdig-email-subject? Any others I've missed?
- HTML 4.0 "title" attribute (not yet handled by ht://Dig, but would be
nice to improve search results)
- Most of Dublin Core META infomation contents (would be nice if ht://Dig
could directly support this META standard).
- Alt text in client side image maps.
-- InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH Waldhofstraße 14 Tel: +49-4101-403605 D-25474 Ellerbek Fax: +49-4101-403606 E-Mail: firstname.lastname@example.org Internet: http://www.inwise.de
------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to email@example.com containing the single word "unsubscribe" in the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Fri Jul 30 1999 - 09:09:44 PDT