Subject: Re: [htdig] using 2 languages at the same time?
From: Geoff Hutchison (email@example.com)
Date: Thu Nov 02 2000 - 05:31:28 PST
At 4:47 PM +0800 11/2/00, Mathias K–rber wrote:
>a) index pages which may occur in any of 2 or more languages
>b) automatically identify which language the files are in (no,
>there is no identifier, this is an email archive which has
>mails in English, German and a few other languages)
No, I'm afraid not. There isn't much "intelligence" in this regard.
Even so, you ask a difficult problem--the code would need to
"recognize" from the text which is one of the harder problems in text
processing. The HTML standard offers several methods for indicating
the language of a document, which would help but from what you say,
these are not used on your pages.
>c) use more than one .aff file, the correct one for each language?
Certainly it would help if ht://Dig kept some metadata for the
language of a document--this would enable language-specific searches
and language-specific fuzzy matching as you describe. But this would
likely be dependent on the META information available in the
>The FAQ seems to say that I should create a subdir $COMMON/german
>and install the german language files there, but that would make the
>English ones unused, no?
That is correct. Of course you can perform searches on all languages
at the same time--the only restriction is that most fuzzy algorithms
won't work well.
-- -Geoff Hutchison Williams Students Online http://wso.williams.edu/
------------------------------------ To unsubscribe from the htdig mailing list, send a message to firstname.lastname@example.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
This archive was generated by hypermail 2b28 : Thu Nov 02 2000 - 05:44:26 PST