Re: [htdig] Stripping java script from pages


Marjolein Katsma (webmaster@javawoman.com)
Fri, 12 Feb 1999 17:22:34 +0100


Hugh,

The new 3.1.0 release has the options noindex_start and noindex_stop to
delimit sections of HTML documents to be ignored.

If you can't edit the pages to enclose the scripts in (default) comment
markers for noindex_start and noindex_stop, you could try to set
noindex_start to <SCRIPT> and noindex_stop to </SCRIPT>. This would cause
htdig to ignore all script sections (but not scripts embedded in tags).

Hope this helps.

At 17:34 1999-02-12 +1100, you wrote:
>
>Hi all,
>
>I have a growing number of sites that I need to index that have java script
>in them. I need a way to strip out the javascript prior to it being
>indexed by htdig.
>
>In the archive someone suggested the use of muffin (a proxy server) which
>would be fine however it seems to require the presence of an X Windows
>system which I will not install.
>
>I'm running FreeBSD, so if someone can suggest a PERL script or some other
>way of doing this I would much appreciate it.
>
>Regards,
>
>Hugh Blandford.
>------------------------------------
>To unsubscribe from the htdig mailing list, send a message to
>htdig@htdig.org containing the single word "unsubscribe" in
>the SUBJECT of the message.
>

Marjolein Katsma webmaster@javawoman.com
Java Woman - http://javawoman.com/
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Wed Feb 17 1999 - 10:10:03 PST