[htdig3-dev] pre-parsing pages


Subject: [htdig3-dev] pre-parsing pages
From: Gareth Watts (gareth@omnipotent.net)
Date: Mon Dec 25 2000 - 13:07:26 PST


Hi

I'm playing with the latest snapshot and I need a way to "filter" the
html files that are being indexed: Specifically I need to filter out
the <script></script> blocks that are in the pages.

I tried using an ExternalParser to call a little perl script to do the
job, using it as a converter from text/html to text/html but htdig loops
over the same file repeatedly (as I rather expected would happen). I
guess I could have the script output text/plain instead, but that seems
a lot of work when it's only parsing an html doc and htdig does that
already.

Is there some technique that I've missed, or is this one for the feature
wish-list?

Thanks

Gareth

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Mon Dec 25 2000 - 13:18:27 PST