Re: [htdig] Htdig search problem

david bernick (
Thu, 01 Jul 1999 12:54:56 -0400

>Not likely. If you want to submit something, feel free. However, I don't
>see how you can pick up JavaScript links since there are essentially an
>infinite ways of making "links" in JavaScript. If you have some general
>solution, I'd be interested in seeing it.

I first must imply that one writes "good" javascript and abides by certain
rules when writing it. it also assumes that you are indexing pages within an
organization that has standardized ways of writing HTML, PDFs and
Javascript. if these things are standardized, it's quite easy to pattern
match for javascript links.
the href property is officially associated with only the location, area and
link objects. this means that location.href=, area.href=, and link.href= are
the only standardized ways to put links in javascript. we can pattern match
for location.href= quite easily. this is the most commonly used way to make
a link in javascript. the link itself is either root relative or absolute.

>It might be easier to do this in Flash, but I don't know much about the

this is alot trickier. flash movies are compacted and vector based, almost
like compiled code, and are only readable by a flash plugin or special flash
reader. if you open a flash movie in a hex editor (even a good text editor),
you can find the URLs and just follow them. the main issue with htdig is
parseing the code for hex when encountering an .swf file. i figure (and this
is only theory so far) you can use the variety of freeware C++ objects that
do hex parsing. as i said, this is very much theory and i'd like to hear
some input on it if anyone has any.


David Bernick - Senior Technologist
One to One Interactive
186 South Street 6th Floor
Boston, MA 02111
vox: 617-574-5020 x 762
fax: 617-574-5022
pager: 617-560-6616

Quote of the Day
"Nobody likes you when you're 23.
And you're still more amused by TV shows.
What the hell is ADD
What's my age again?"
Blink 182
To unsubscribe from the htdig mailing list, send a message to containing the single word "unsubscribe" in
the SUBJECT of the message.

This archive was generated by hypermail 2.0b3 on Thu Jul 01 1999 - 09:09:38 PDT