Re: [htdig3-dev] Accentuated characters.


Torsten Neuer (tneuer@inwise.de)
Fri, 4 Jun 1999 09:00:35 +0200


According to Neil Mansilla:
>I am having problems getting international characters indexed.
>For example: balisés
>
>The documentation gets into "locale", and dictionaries, etc.
>Can someone share some insight as to how we can modify the
>ht://dig source so that it indexes ALL single-byte characters
>(including standard AND accentuated characters) in the
>htdig and htmerge process?

Read the documentation again and be enlightened ;-)

You don't need to modify ht://Dig's sources to achieve this.
Only a little configuration is required, i.e. in the file
that per default is called "htdig.conf".

Here an excerpt from our German config:

[...]
nothing_found_file: ${database_dir}/nomatch.html
search_results_footer: ${database_dir}/footer.html
search_results_header: ${database_dir}/header.html
syntax_error_file: ${database_dir}/syntax.html
template_map: DE de ${database_dir}/template.html
method_names: and Und or Oder boolean Boolean
bad_word_list: ${common_dir}/de.bad_words
endings_affix_file: ${common_dir}/de.aff
endings_dictionary: ${common_dir}/de.0
endings_root2word_db: ${common_dir}/de.root2word.db
endings_word2root_db: ${common_dir}/de.word2root.db
synonym_dictionary: ${common_dir}/de.synonyms
synonym_db: ${synonym_dictionary}.db
locale: de_DE
date_format: %d.%m.%Y
[...]

The same part in our English config:

[...]
nothing_found_file: ${database_dir}/nomatch.html
search_results_footer: ${database_dir}/footer.html
search_results_header: ${database_dir}/header.html
syntax_error_file: ${database_dir}/syntax.html
template_map: EN en ${database_dir}/template.html
bad_word_list: ${common_dir}/en.bad_words
endings_affix_file: ${common_dir}/en.aff
endings_dictionary: ${common_dir}/en.0
endings_root2word_db: ${common_dir}/en.root2word.db
endings_word2root_db: ${common_dir}/en.word2root.db
synonym_dictionary: ${common_dir}/en.synonyms
synonym_db: ${synonym_dictionary}.db
locale: en_EN
date_format: %d %B %Y
[...]

hth,
  Torsten

--
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstraße 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: info@inwise.de            Internet: http://www.inwise.de

------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to htdig3-dev@htdig.org containing the single word "unsubscribe" in the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu Jun 03 1999 - 23:25:12 PDT