Re: htdig: priority


Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Thu, 3 Dec 1998 11:58:00 -0600 (CST)


According to Geoff Hutchison:
> I guess I'm not quite sure what you're asking. At the moment, it's
> difficult to do searches on "author," "title," "topic" or whatever.
> Ht://Dig is really designed as a general-purpose search engine, so it
> indexes the text of the pages. Personally, I find it does a good job on
> turning up titles, authors, etc. from the pages themselves.

That would work if you don't have a lot of papers or articles indexed,
or they don't contain a lot of references. If you have many papers
indexed, and they contain dozens of references each, then a search for
an author or title would pull up more papers that refer to the author or
title in question, than papers written by the desired author or containing
the desired title keywords.

For author searches, the technique of using separate databases as you and
Paul Wolstenholme discussed in the Meta Tags Question thread would be the
best approach.

A similar approach could be used for title searches, but that's where you'd
really want a phrase search capability. Of course, if the article's title
is in HTML <title></title> tags, htsearch will rank it higher when the title
contains the keywords for which you're searching. If you do that, then a
general-purpose text search will make a decent title search as well.

Topic searches would be greatly enhanced by building up the common/synonyms
file with all the synonyms relevant to the topics covered on your site,
and running htfuzzy to build up the synonym database for fuzzy matches.
(Something I still need to do on my site.)

> As far as languages, there is support for indexing languages other than
> English. I'm not the best person to talk to, though. :-(

Despite my French name, I too have only indexed English text on our site,
so I don't know much about international support. In general, though, you'd
need separate dictionaries, databases, and configuration files for each
language you index.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:29:46 PST