Re: htdig: [Patch] non english text parser broken


Vadim Chekan (vadim@gc.lviv.ua)
Mon, 9 Nov 1998 14:32:15 +0200


>No, but the enconding i'm using is iso-latin1 which is the default.
>The non-ascii characters are all in iso-latin1, not html entities.
>I'm using the same configuration file I used for 3.1.0b1 and things
>worked fine in 3.1.0b1

Do you have non-latil words in debug output?
./htdig -isvvvvvvv > ttt
tail ttt
word: Ukraine*@2
word: Kiev*@2
word: Reitarskaya,@2
word: 8-a@2
<cut>
word: ?????@993
word: ?????????????@994
word: ??@996
word: ??????????@998
word: ????????.@1000
 size = 23040
pick: gate.gc.lviv.ua:80, # servers = 1
htdig: Run complete
htdig: 1 server seen:
htdig: gate.gc.lviv.ua:80 10 documents

I have russia words in parser debug output.

Vadim Chekan.

> 3. You can whether check up it works by looking in the db.wordlist
> Until you don't have there non-english words, you have problem.
>

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:46 PST