Re: htdig: Narrowing Scope of Search using Meta tags


Andrew Scherpbier (andrew@contigo.com)
Thu, 18 Jun 1998 06:23:47 -0700


Paul Wolstenholme wrote:
>
> We are using HtDig to archive our online journal. Recently, I was asked
> whether the HtDig form could be modified so that if could search
> exclusivley by author or title of the journal article.
>
> The problem is that if you are looking for a article written by Richard
> Smith. The result set returned can be quite extensive because he may have
> been mentioned in many articles besides those he authoured.
>
> At first I didn't think there was much we could do about this but I
> looked at the htdig documentation and was wondering whether or not htsearch
> could be configured to search for values of keys in the the htdig key words
> meta tag.
>
> For example, if each document contained a meta tag like:
>
> <META NAME="htdig-keywords" AUTHOUR="Richard Smith" TITLE="Intellectual
> Property">
>
> Is it then possible to set your search form so that it only returns the
> pages where AUTHOR contains "Richard Smith". Or, would this require
> significant rewrites to htdig, htmerge and htsearch?
>
> /Paul Wolstenholme

Unfortunately, there is no clean way to do this. However, you could setup two
databases. One will be a normal one while the second one is greated with a
config file that has the following:

keyword_factor: 10
text_factor: 0
title_factor: 0
heading_factor_1: 0
heading_factor_2: 0
heading_factor_3: 0
heading_factor_4: 0
heading_factor_5: 0
heading_factor_6: 0

This will cause only words in the keyword list to have any weight in searches
and hence matches will be ordered accordingly.

(This is one of the things I want to fix in ht://Dig4, by the way.)

-- 
Andrew Scherpbier <andrew@contigo.com>
Contigo Software <http://www.contigo.com/>
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:26:34 PST