Re: htdig: Determining the language of one page ...

Stephane Bortzmeyer (
Fri, 11 Sep 1998 17:51:28 +0200

On Thursday 10 September 1998, at 20 h 43, the keyboard of Jose Agustin Lopez
Bueno <> wrote:

> Are there any way to determinate which is it?

The simplest way is to count the number of specific words (like 'and', 'or',
'then' in english). I have a procmail recipe which uses procmail's scoring to
find the language of a message and sort it accodingly. Warning: its far from
being 100 % safe.

I patched Harvest (remember Harvest?) to add such a feature: SOIF files were outputted with a new attribute, "language". As you know, Harvest died before incorporating my patch but I can find it if you like.

