Re: [htdig] using 2 languages at the same time?


Subject: Re: [htdig] using 2 languages at the same time?
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Thu Nov 02 2000 - 05:31:28 PST


At 4:47 PM +0800 11/2/00, Mathias K–rber wrote:
>a) index pages which may occur in any of 2 or more languages

Well, sure.

>b) automatically identify which language the files are in (no,
>there is no identifier, this is an email archive which has
>mails in English, German and a few other languages)

No, I'm afraid not. There isn't much "intelligence" in this regard.
Even so, you ask a difficult problem--the code would need to
"recognize" from the text which is one of the harder problems in text
processing. The HTML standard offers several methods for indicating
the language of a document, which would help but from what you say,
these are not used on your pages.

>c) use more than one .aff file, the correct one for each language?

Certainly it would help if ht://Dig kept some metadata for the
language of a document--this would enable language-specific searches
and language-specific fuzzy matching as you describe. But this would
likely be dependent on the META information available in the
documents themselves.

>The FAQ seems to say that I should create a subdir $COMMON/german
>and install the german language files there, but that would make the
>English ones unused, no?

That is correct. Of course you can perform searches on all languages
at the same time--the only restriction is that most fuzzy algorithms
won't work well.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Thu Nov 02 2000 - 05:44:26 PST