[htdig] new to htdig


Subject: [htdig] new to htdig
From: Robert Marchand (robert.marchand@UMontreal.CA)
Date: Tue Feb 22 2000 - 13:36:13 PST


Hi,

   we have finished a first phase of search engines evaluation for
indexing our domain (umontreal.ca), and ht://Dig seems to be the
winner.

Although, there are some issues that need to be resolved before we
adopt it.

1) We badly need the 'fuzzy accent' algorithm or whatever the solution
would be to be able to search a word with and without accents: like
"Montréal" and "Montreal" and get the same results. This is very
important for us. I've look at some discussion on this topic here and
would like to know if it is soon to be released. If not, then we will
have to find a quick-and-dirty solution like patch some files by
ourselves.

I've look a little in the code (not a C++ expert) and I understand that
it would need several patches to have the following requirements:

- search either "Montréal" or "Montreal" and get all the occurences as if
someone had typed "Montreal or Montréal".

- hilite the word that was search.

I know the code does it for the min/maj cases. Could the same be done
for accents?

----

2) We have a problem with robots.txt and the database. It seems that if the file robots.txt is modified or added after a complete reindex from scratch and BEFORE an update reindex, some files that are now no more accepted are keeped in the database. Does it means that a complete reindex has to be done after a change in a robots.txt? That seems a bit harsh. We have no control over all the sites to index.

Am I wrong? Is this a bug?

I'll appreciate responses! We're using release version 3.1.4.

Thanks.

------- Robert Marchand tél: 343-6111 poste 5210 DiTER-SDI e-mail: marchanr@diter.umontreal.ca Université de Montréal Montréal, Canada

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Feb 22 2000 - 13:39:44 PST