Re: htdig: htdig, muffin and javascript

Colin Viebrock (
Thu, 17 Sep 1998 15:29:38 -0400

Thus spake Geoff Hutchison (at 01:37 PM 9/17/98 -0400) ...
>I guess the "problem" is this: ht://Dig interprets JavaScript in HTML
>files as text. So if we can take the code Muffin uses to strip JavaScript
>and add it to a "remove JavaScript" pass over the HTML files before
>ht://Dig begins the real indexing, we'd be set.

What about the "problem" of people using JS to pop up windows and other
URLs and such? If you simply strip all the JS code from a document, you'll
lose these links (and the info in them).

And I haven't even mentioned JS that creates URL references on the fly, or
based on other variables. Good luck coding a parser for that!

The only complete solution I can see is to write a program that emulates a
browser and follows every possible link, button, image map, etc. possible
from that page.

[or do the digging on the server side ... but then what URL do you present
to the user?]

Colin Viebrock Creative Director Private World Communciations

                                                      Keyboard not found:
                                            Press any key to continue ...
To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the body of the message.

This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:27:47 PST