Re: htdig: Determining the language of one page ...


Colin Viebrock (cmv@privateworld.com)
Fri, 11 Sep 1998 13:42:51 -0400


Thus spake Stephane Bortzmeyer (at 05:51 PM 9/11/98 +0200) ...
>The simplest way is to count the number of specific words (like 'and', 'or',
>'then' in english).

Ugh! That's not too safe. How about parsing the doc-info line of the
document:

     <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
                                                  ^^
                                          you'll want this

Of course, *so* many people actually put this into their HTML ...

________________________________________________________________________
Colin Viebrock Creative Director
cmv@privateworld.com Private World Communciations
                                             http://www.privateworld.com

                                                   Your mouse has moved.
                                           Windows must be restarted for
                                              the change to take effect.

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:27:47 PST