Subject: Re: [htdig] Problem with accents...
From: Daniele Bufarini (email@example.com)
Date: Fri Jun 02 2000 - 02:43:21 PDT
I've had the same problem for the site of the company I work for (where all
the pages are in Italian): I've solved the problem exactly in the way you
suggested ! In fact I've modified both HTML.cc and htsearch.cc so that
htsearch.cc now converts the word in input in the equivalent word without
accent; HTML.cc does the same, i.e. when HTML.cc encounters an accented
word, it "strips" all the accents.
Here is the steps you have to follow:
The first thing to do is to add the following lines to the configuration file:
With the first line you tell the parser that the characters
"òàùèéìÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝáâãäåçêëíîïñóôõöúûüý" are valid character
for any word it encounters; the "translate_amp: true" directive tells htdig
to translate the SGML entities (i.e.: "Garçon") in accented
characters ("Garçon"): the accented characters are, in both cases,
converted in unaccented ones from my patch.
I'm sending you (in attachment) the source files I've modified; my patches
are clearly identified by two comments: "// Begin: patch by Daniele
Bufarini." and "// End: patch by Daniele Bufarini."
Once the files are in the right directories, all you have to do is to
recompile the sources with the make command and then to rebuild the htdig
databases with the script "rundig" (or whatever else you are used to).
I hope you can use them successfully !
P.s.: You can try my patches at http://www.alice.it/index/ihome.htm ; enter
an accented word in the field labeled "Trova:" and it will find the same
word without accent !
Ing. Daniele Bufarini
I.E. Informazioni Editoriali s.r.l. - Via Bergonzoli 1/5 - 20127 Milano - ITALY
Voice: +39 02 283151 Fax: +39 02 28315900
To unsubscribe from the htdig mailing list, send a message to
You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Fri Jun 02 2000 - 00:32:53 PDT