Subject: [htdig3-dev] Fwd: RE: [htdig] How does the DIG the ranking?
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Mon Mar 13 2000 - 18:33:49 PST
This is from S. Budd <s.budd@ic.ac.uk> on the subject of ranking (in 3.1)
>Ranking pages and the use of Meta tags with Htdig
>
>1. How pages are ranked.
>
>The search program "htsearch" ranks the web pages which satisfy the
>search terms before they are returned in the results page. It uses a
>complex rule to rank the pages. This rule takes into account the
>following factors which can be set either on the search form or in the site
>configuration file.
>
>
>description_factor
> Plain old "descriptions" are the text of a link pointing to a document.
>This factor gives weight to the words of these descriptions of the document.
>Not surprisingly, these can be pretty accurate summaries of a document's
>content. default: 150 example: description_factor: 350
>
>
>heading_factor
>This is a factor which will be used to multiply the weight of word between
><h1> and </h1> tags, as well as headings of levels <h2> through <h6>. It
>is used to assign the level of importance to headings. Setting a factor to
>0 will cause words in these headings to be ignored. The number may be a
>floating point number. Default 5 example: heading_factor: 20.9
>
>
>keywords_factor
>This is a factor which will be used to multiply the weight of words in this
>list of keywords of a document. The number may be a floating point number.
>Default 10 example: keywords_factor: 12
>
>
>meta_description_factor
>This is a factor which will be used to multiply the weight of words in any
>META description tags in a document. The number may be a floating point
>number. Default 50 example: meta_description_factor: 20
>
>
>
>text_factor
>This is a factor which will be used to multiply the weight of words that
>are not in any special part of a document. Setting a factor to 0 will
>cause normal words to be ignored. The number may be a floating point
>number. Default 1 example: text_factor: 0
>
>
>title_factor
>This is a factor which will be used to multiply the weight of words in the
>title of a document. Setting a factor to 0 will cause words in the title to
>be ignored. The number may be a floating point number. Default 100.
>example: title_factor: 12
>
>
>backlink_factor
>This is a weight of "how important" a page is, based on the number of
>URLs pointing to it. It's actually multiplied by the ratio of the incoming
>URLs (backlinks) and outgoing URLs, to balance out pages with lots of
>links to pages that link back to them. This factor can be changed
>without changing the database in any way. However, setting this value to
>something other than 0 incurs a slowdown on search results. Default 1000.
>example: backlink_factor: 501.1
>
>
>date_factor
> This factor, like backlink_factor can be changed without modifying the
>database. It gives higher rankings to newer documents and lower rankings
>to older documents. Before setting this factor, it's advised to make sure
>your servers are returning accurate dates (check the dates returned in
>the long format). Additionally, setting this to a nonzero value incurs a
>performance hit on searching. Default 0 example date_factor :
>0.35
>
>2. Using <META .... > tags.
>
> In HTML, any number of <META> tags can be used between the <HEAD> and
></HEAD> tags of a document. There are three possible attributes to this tag,
>two of which are recognized by ht://Dig: One is NAME which is used to
>name a specific property and the other is CONTENT which is used to supply
>the value for a named property. For example, a document could start with
>something like the following:
>
> <HTML>
> <HEAD>
> <META NAME="htdig-keywords" CONTENT="phone telephone online
>electronic directory">
> <META NAME="htdig-email" CONTENT="pat.user@nowhere.net">
> <TITLE>Some document title</TITLE>
> </HEAD>
> <BODY>
>
> Body of document
>
> </BODY>
> </HTML>
>
>
>
>
>
>
>
>
>Htdig recognizes the following values for NAME 's
>
>NAME="Htdig-keywords"
>The value of this property should be a blank separated list of keywords
>which will get a very high weight when searching. This can be used to get
>around some problems with common synonyms for words in the document. For
>example, if a document is a telephone directory, possible keywords could be
>"telephone phone directory book list". Now, regardless of what text is
>actually in the document, it can be found if these keywords are used in the
>search. The weight that words in the content string will have in
>
>NAME="keywords"
> The value of this property should be a blank separated list of keywords,
>just as for the htdig-keywords property. They are treated as equivalent by
>htdig. The reason for two different properties is that the keywords property
>is used by other search engines as well, while the htdig-keywords property
>can be used for words you want indexed only by htdig. You can get htdig to
>treat other property names as equivalent to htdig-keywords, or disable the
>htdig-keywords or keywords properties, by changing the
>keywords_meta_tag_names attribute in your configuration.
>
>NAME="description"
> The value allows you to specify an alternate excerpt (description) of a
>page. If the config-file attribute use_meta_description is used, then any
>documents with descriptions will use them instead of the automatically
>generated excerpts. The weight that words in the content string will have
>in search results is controlled by the meta_description_factor attribute in
>your configuration.
>
>There is also the possibility of introducing arbitrary <META NAME="xxx"
>tags. For example
>
> <META NAME="dc.creator" CONTENT="Paul Wolstenholme">
> <META NAME="dc.creator" CONTENT="Richard Smith">
>
>
>To do this you have to introduce the following two configuration entries:
>
>keywords_meta_tag_name ( needed when digging is done)
>The words in this list are used to search for keywords in HTML META tags.
>This list can contain any number of strings that each will be seen as the
>name for whatever keyword convention is used. The META tags have the
>following format: <META NAME="somename" CONTENT="somevalue"> default:
>keywords htdig-keywords example: keywords_meta_tag_names
>keywords description
>
>In the above example you would use keywords_meta_tag_names: dc.creator
>
>
>
>max_meta_description_length ( needed when digging is done )
>While gathering descriptions from meta description tags, htdig will
>truncate descriptions which are longer than this length. This is required
>in case a webmaster tries to swamp a search result by repeating a keyword
>may times. Default 512 example: max_meta_description_length:
>1000
>
>
>
>
>It is possible to have the NAME="description" CONTENT=" xxx ..... " meta
>tag used for the description of a found page instead of the usual excerpts.
>This is accomplished with the following configuration parameter
>
>use_meta_description
> If set to true, any META description tags will be used as excerpts by
>htsearch. Any documents that do not have META descriptions will retain
>their normal excerpts. Default false. example:
>use_meta_description: true
>
>
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev-unsubscribe@htdig.org
You will receive a message to confirm this.
This archive was generated by hypermail 2b28 : Mon Mar 13 2000 - 18:42:18 PST