Re: htdig: european chars

Tim Maroney (
Wed, 29 Apr 1998 10:24:05 -0700

This is all quite fascinating, but it's not my issue, which is that in
HTML, accented and otherwise modified characters of the Roman character
set are supposed to be represented with HTML multi-byte special
characters, not as high-bit-set ASCII, since the conventions for
interpreting extended ASCII that vary from platform to platform. htdig is
mapping these HTML special characters to a single-byte internal
representation on text acquisition but then not mapping them back to HTML
on output, leading to weird-looking search displays -- try searching for
"Rabelais" at, and look at how the words "Francois" and
"Theleme" come out in Mac Explorer. Is there any way to fix this, short
of writing a wrapper around htsearch that does character mapping?
Shouldn't htdig just do the right thing to start with?

Also, is the answer to my other questions (about unwanted backslash
removal in search results, and restriction to exclude subdirectories)
that there's nothing that can be done short of modifying htdig source
code? My ISP is finicky about binary executables and even if I dug up a
UNIX shell login somewhere and made this change I wouldn't be able to use
a custom version on my web site. If the answer is that these things can't
be done in vanilla htdig I'd like to know. Thanks!

Tim Maroney
"The world is made possible, in part, by murk."

