RE: [htdig] how to defind word


Subject: RE: [htdig] how to defind word
From: NEPOTE Charles (Neuilly Gestion) (charles.nepote@cetelem.fr)
Date: Mon Jul 10 2000 - 00:36:03 PDT


I found somthing which might interest you.
See CTTeX : General-Purpose Thai word segmentation program

Sources are available at :
http://thaigate.nacsis.ac.jp/ftp/thaisoft/new/cttex/
And there is a binary version is available for Linux Mandrake (which may
work also on RedHat) ; it will be available soon on a Mandrake mirror :
see : http://www.linux-mandrake.com/en/cookerdevel.php3

Here is the description for the Linux Mandrake RPM :

--=-=-=
Name : cttex Relocations: (not relocateable)
Version : 1.21 Vendor: MandrakeSoft
Release : 1mdk Build Date: Thu Jun 29 04:55:09
2000
Install date: (not installed) Build Host:
kenobi.mandrakesoft.com
Group : System/Internationalization Source RPM: (none)
Size : 442255 License: Distributable
Packager : Pablo Saratxaga <pablo@mandrakesoft.com>
URL : http://thaigate.nacsis.ac.jp/files/index.html
Summary : Cttex, Thai word separator program
Description :
The main part of Cttex is A Thai Word Separator algorithm using
a dictionary. A wrapper for formatting Thai LaTeX document file is provided
to demonstrate the use of this word-sep routine. The program can also
be used as a simple word-sep filter.
--=-=-=
* Wed Jun 28 2000 Pablo Saratxaga <pablo@mandrakesoft.com> 1.21-1mdk
- first rpm version for Mandrake
--=-=-=

Best regards,
Charles Népote.

> -----Message d'origine-----
> De : Prisda Gomutputra [mailto:prisda@loxinfo.co.th]
> Envoyé : samedi 24 juin 2000 19:51
> À : htdig@htdig.org
> Objet : [htdig] how to defind word
>
>
> I am currently trying to fine tuning Ht://dig to be able to
> work with Thai
> (8bit) language more accurately. I can get it to work fine
> but the accuracy
> of the search is not highly relavent since Thai lanuage does
> not have space
> to separate words. Space is only used to seperate sentences.
>
> For example, a sentense in English "this is tesRt1. this is
> test2", it would
> be written in thai as follow "thisisteRst1. thisistest2"
> ^^^^
> 1) Is there a way to tell ht://dig to be able to identify the
> words and
> index them properly?
> 2) when the words are combided togeter with out space in between, it
> intorduc a new problem such as the example above,
> "thiSISTERst1". When user
> search for a word "sister", "thiSISTERst1" will be returned
> too. is there
> a way to prevent this problem from happening?
>
> Highly appreciated
> Prisda
>
>
> ------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> htdig-unsubscribe@htdig.org
> You will receive a message to confirm this.
>



This archive was generated by hypermail 2b28 : Sun Jul 09 2000 - 21:54:40 PDT