Re: [htdig] problems with accents


Torsten Neuer (tneuer@inwise.de)
Thu, 20 May 1999 14:48:54 +0200


According to Philippe Riviere:
>>>1) some browsers don't like URLs containing accentuated letters (it would
>>>be better to have them escaped). This happens in the results page when your
>>>search of an accentuated word yields many results : the 1 2 3 4 5 next
>>>links contain accents
>>
>>It would certainly be better to not have accentuated letters in URLs
>>in general. IMHO this is more a matter of proper naming of document
>>files than of having search engines recognizing them. I'd bet you'll
>>go into trouble with that with more than just ht://Dig..
>
>True. But htsearch itself generates URLs poiting back to itself ("go to
>next page of results") and should not use accents in these.
>
>
>>>2) searching "étude" does not yield "etudes" and vice-versa. I'd prefer
>>>it to.
>>>
>>
>>Look at ht://Dig documentation, set your locale to a proper value
>>(probably fr_FR), get a french dictionary and affix rule file for
>>the endings algorithm and re-index your site.
>
>locale is currently set to fr_FR ; is there something else to add ?

- Check whether this value is valid on your system.
  It might differ in some cases.

- You'll only get the plurals etc. if you're using the "endings"
  algorithm. For that to use you need to generate the proper endings.db
  from a dictionary of French words and affix rules. Using the
  substring matching algorithm should work, too, but will also
  use a lot more execution time which might be significant if
  your document.db is large.

cheers,
  Torsten

--
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstraße 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: info@inwise.de            Internet: http://www.inwise.de

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig@htdig.org containing the single word "unsubscribe" in the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Thu May 20 1999 - 05:12:40 PDT