The search program "htsearch" ranks the web pages which satisfy the search terms before they are returned in the results page. It uses a complex rule to rank the pages. This rule takes into account the following factors which can be set either on the search form or in the site configuration file.description_factor
Plain old "descriptions" are the text of a link pointing to
a document. This factor gives weight to the words of these
descriptions of the document. Not surprisingly, these can be pretty
accurate summaries of a document's content.
default: 150 example: description_factor: 350heading_factor
This is a factor which will be used to multiply the weight of word
between <h1> and </h1> tags, as well as headings of levels
<h2> through <h6>. It is used to assign the level of
importance to headings. Setting a factor to 0 will cause words in
these headings to be ignored. The number may be a floating point number.
default: 5 example: heading_factor: 20.9keywords_factor
This is a factor which will be used to multiply the weight of words in this
list of keywords of a document. The number may be a floating point number.
default: 10 example: keywords_factor: 12meta_description_factor
This is a factor which will be used to multiply the weight of words in any
META description tags in a document. The number may be a floating point
default: 50 example: meta_description_factor: 20text_factor
This is a factor which will be used to multiply the weight of words that
are not in any special part of a document. Setting a factor to 0 will
cause normal words to be ignored. The number may be a floating point
default: 1 example: text_factor: 0title_factor
This is a factor which will be used to multiply the weight of words in the
title of a document. Setting a factor to 0 will cause words in the title to
be ignored. The number may be a floating point number.
default: 100 example: title_factor: 12backlink_factor
This is a weight of "how important" a page is, based on
the number of URLs pointing to it. It's actually multiplied by the
ratio of the incoming URLs (backlinks) and outgoing URLs, to balance
out pages with lots of links to pages that link back to them. This
factor can be changed without changing the database in any
way. However, setting this value to something other than 0 incurs a
slowdown on search results.
default: 1000 example: backlink_factor: 501.1date_factor
This factor, like backlink_factor can be changed without modifying the
database. It gives higher rankings to newer documents and lower rankings
to older documents. Before setting this factor, it's advised to make sure
your servers are returning accurate dates (check the dates returned in
the long format). Additionally, setting this to a nonzero value incurs a
performance hit on searching.
default: 0 example date_factor: 0.35
In HTML, any number of <META> tags can be used between the <HEAD> and </HEAD> tags of a document. There are three possible attributes to this tag, two of which are recognized by ht://Dig: One is NAME which is used to name a specific property and the other is CONTENT which is used to supply the value for a named property. For example, a document could start with something like the following:
<HTML> <HEAD> <META NAME="htdig-keywords" CONTENT="phone telephone online electronic directory"> <META NAME="htdig-email" CONTENT="email@example.com"> <TITLE>Some document title</TITLE> </HEAD> <BODY> Body of document </BODY> </HTML>
Htdig recognizes the following values for NAME'sNAME="htdig-keywords"
The value of this property should be a blank separated list of keywords which will get a very high weight when searching. This can be used to get around some problems with common synonyms for words in the document. For example, if a document is a telephone directory, possible keywords could be "telephone phone directory book list". Now, regardless of what text is actually in the document, it can be found if these keywords are used in the search. The weight that words in the content string will have in a search can be modified using the keywords_factor attribute as outlined aboveNAME="keywords"
The value of this property should be a blank separated list of keywords, just as for the htdig-keywords property. They are treated as equivalent by htdig. The reason for two different properties is that the keywords property is used by other search engines as well, while the htdig-keywords property can be used for words you want indexed only by htdig. You can get htdig to treat other property names as equivalent to htdig-keywords, or disable the htdig-keywords or keywords properties, by changing the keywords_meta_tag_names attribute in your configuration.NAME="description"
The value allows you to specify an alternate excerpt (description) of a page. If the config-file attribute use_meta_description is used, then any documents with descriptions will use them instead of the automatically generated excerpts. The weight that words in the content string will have in search results is controlled by the meta_description_factor attribute in your configuration.
There is also the possibility of introducing arbitrary <META NAME="xxx" tags. For example:
<META NAME="dc.creator" CONTENT="Paul Wolstenholme"> <META NAME="dc.creator" CONTENT="Richard Smith">
To do this you have to introduce the following two configuration entries:keywords_meta_tag_name ( needed when digging is done)
The words in this list are used to search for keywords in HTML META
tags. This list can contain any number of strings that each will be
seen as the name for whatever keyword convention is used. The META
tags have the following format: <META NAME="somename"
default: keywords htdig-keywords example: keywords_meta_tag_names: keywords description dc.creator
In the above example you would use keywords_meta_tag_names: dc.creatormax_meta_description_length (needed when digging is done) While gathering descriptions from meta description tags, htdig will truncate descriptions which are longer than this length. This is required in case a webmaster tries to swamp a search result by repeating a keyword many times.
default: 512 example: max_meta_description_length: 1000
It is possible to have the NAME="description" CONTENT=" xxx ..... " meta tag used for the description of a found page instead of the usual excerpts. This is accomplished with the following configuration parameter:use_meta_description
If set to true, any META description tags will be used as excerpts by
htsearch. Any documents that do not have META descriptions will retain
their normal excerpts.
default: false example: use_meta_description: true