Re: htdig: Determining the language of one page ...


Stephane Bortzmeyer (stephane@sources.org)
Fri, 11 Sep 1998 17:51:28 +0200


On Thursday 10 September 1998, at 20 h 43, the keyboard of Jose Agustin Lopez
Bueno <Agustin.Lopez@uv.es> wrote:

> Are there any way to determinate which is it?

The simplest way is to count the number of specific words (like 'and', 'or',
'then' in english). I have a procmail recipe which uses procmail's scoring to
find the language of a message and sort it accodingly. Warning: its far from
being 100 % safe.

I patched Harvest (remember Harvest?) to add such a feature: SOIF files were outputted with a new attribute, "language". As you know, Harvest died before incorporating my patch but I can find it if you like.

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:27:46 PST