Re: [htdig] I need some reasonable settings for: search_algorithm ?

Dr. Thomas-M.Stein (
Mon, 5 Jul 1999 12:32:51 +0200 (CEST)

Dear Geoff, Dear Gabriel

many thanks for your very quick reply. I have adjusted my 'slightly' out
of balance search options. I am still testing. The problem I am facing is
that I will have to index various data from various people and 'net
epochs' on different sites. That might be quite similar to your VL site,
Gabriel (by the way I am running the VL Irrigation which I am trying to to
'put in shape again'. Getting that information ranked on top which best
presents the search term is not easy. I will have sites and pages to index
built in the usual ('old') way with <h...> tags etc. but also sites with
'pure text' as the font tag is been used to make the header as well as
sites starting with frames and other keeping keyword in meta tags. Any
ideas about what else to watch or which source to read for additional
helpfull information would be appreciated. I have included the following
settings which I hope will best suit my needs of getting entire sites or
single pages etc.:

# A file including all urls to be indexed
start_url: `${config_dir}/url_start`

# to limit for a entire site e.g.
# to get the etire site
# to get only one file
limit_urls_to: ${start_url}

# excluding certain directories, trees or files
exclude_urls: `${config_dir}/url_exclude`

keywords_meta_tag_names: keywords

# still experimenting
keyword_factor: 100

# not sure to leave to default or not
# title_factor: 12

# the same for heading factor
# left on default
# heading_factor

# Any ideas about
text_factor: 1

# What does the following line say
# substring_max_words: 15

Many thanks for any hints.



 Dr.-Ing. Thomas-M. Stein Email :
 University of Kassel WWW :
 D-37213 Witzenhausen (Germany) List owner:

To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Mon Jul 05 1999 - 02:55:06 PDT