Re: [htdig] Custom factors?


Subject: Re: [htdig] Custom factors?
From: David Robley (huntsman@www.nisu.flinders.edu.au)
Date: Wed Dec 15 1999 - 15:35:15 PST


On 16 Dec, Simon Blake wrote:
> Hi there
>
> I've just setup htdig to index a website that has a sitemap on every page,
> included as a drop down <select> menu. Because this is on every page, if
> you search a term that occurs in the drop down list, every page in the
> webspace is returned, which isn't wonderful on a website with several GB
> of static pages!
>
> Therefore, I'd like to prevent htdig from indexing material between
> <select name=url> and </select>. Is this a straightforward way to achieve
> this? Looking at the factor system, it struck me that a neat way to do
> this would be with a custom factor - you define the start and end tags,
> maybe with a regexp, and everything in between gets the relevant weight.
>
> I've had a good look through htdig.org, and I don't see anything -
> apologies if this is a FAQ...
>
> Cheers
> Si
>

How about this?

noindex_start, noindex_end
    type:
        string
    used by:
        htdig
    default:
        <!--htdig_noindex--> <!--/htdig_noindex-->
    description:

The text encompassing a section of an HTML file that should be
completely ignored when indexing. As in the defaults, this can be SGML
comment declarations that can be inserted anywhere in the documents to
exclude different sections from being indexed . How ever,existing tags
can also be used; this is especially useful to exclude some sections
from being indexed where the files to be indexed can not be edited. The
example shows how SCRIPT sections in 'uneditable' documents can be
skipped; note how noindex_start does not contain an ending >: this
allows for all SCRIPT tags to be matched regardless of attributes
defined (different types or languages). Note that the match for this
string is case insensitive.

    example:
        noindex_start: <SCRIPT
        noindex_end: </SCRIPT>

Cheers

-- 
David Robley

WEBMASTER | Phone +61 8 8374 0970 RESEARCH CENTRE FOR INJURY STUDIES | http://www.nisu.flinders.edu.au/ AusEinet | http://auseinet.flinders.edu.au/ Flinders University, ADELAIDE, SOUTH AUSTRALIA Visit the PHP mirror at http://au.php.net:81/

<<<<<<<<<<<<< WARNING * END OF TEXT * STOP READING HERE >>>>>>>>>>>>>>

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Dec 15 1999 - 16:14:55 PST