Re: [htdig] How can I remove some keywords, from search results?


Subject: Re: [htdig] How can I remove some keywords, from search results?
From: Torsten Neuer (tneuer@inwise.de)
Date: Sat May 06 2000 - 12:49:39 PDT


> Tasos Angelis wrote:
>
> I noticed that in the search results only html tags are removed.
> But, how can I remove some other unwanted words from the results of
> htsearch?
>
> For example in a site that htdig indexes there are some javascript
> functions.
> I want to selectivly remove those functions.
> e.g. .... something something something .... findit();.... something
> to be:
> .... something something something .... .... something
> without findit(); in it.
> There are 10 or little more of those functions.

You should put anything inside of <SCRIPT> tags in SGML comments.
This is not only valid vof Ht://Dig, but also for most other robot
software which else will interpret the contents of <SCRIPT> as normal
text. Another approach to leave it out would be to put the JavaScript
code in an external file (<SCRIPT SRC="file.js">). I don't know any
crawler that will follow the SRC-attribute of the <SCRIPT> tag.
You can also configure Ht://Dig to ignore anything inside <SCRIPT>
by using the noindex_start/noindex_end configuration attributes.

hth,
  Torsten

-- 
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstraße 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: info@inwise.de            Internet: http://www.inwise.de

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Sat May 06 2000 - 10:37:47 PDT