Re: [htdig] re: parsing stuff


Subject: Re: [htdig] re: parsing stuff
From: gil cohen (cohengil4@hotmail.com)
Date: Thu May 11 2000 - 14:05:32 PDT


Would it be possible to do this in perl?

>From: Gilles Detillieux <grdetil@scrc.umanitoba.ca>
>To: cohengil4@hotmail.com (gil cohen)
>CC: htdig@htdig.org
>Subject: Re: [htdig] re: parsing stuff
>Date: Thu, 11 May 2000 15:33:40 -0500 (CDT)
>
>According to gil cohen:
> > Okay, here's a proram I wrote:
> >
> > -------------
> > cat $1|tr -d '\12'|sed -e 's/.*<title>//' -e 's/<\/title>.*//' >> /test
> > echo "Content-Type: text/html"
> > echo ''
> > cat $1
> > -------------
> >
> > Then, I put the following in the config file:
> > text/html->text/html "sh /RIDOF.sh"
>
>That's not quite complete. You need to have "external_parsers: " in
>front of that. However, that still won't work - in fact, it will cause
>htdig's ExternalParser module to recursively call itself until it blows
>its stack. Right now, external converters must convert one mime type
>to a different type, and the chain must eventually lead to an actual
>parser (whether internal or external).
>
>You have two choices: you can modify the existing internal HTML parser,
>or you can write a full external parser for HTML, that will grab all
>the information you want from the documents, including links to other
>documents.
>
>--
>Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca>
>Spinal Cord Research Centre WWW:
>http://www.scrc.umanitoba.ca/~grdetil
>Dept. Physiology, U. of Manitoba Phone: (204)789-3766
>Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
>
>------------------------------------
>To unsubscribe from the htdig mailing list, send a message to
>htdig-unsubscribe@htdig.org
>You will receive a message to confirm this.
>

________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Thu May 11 2000 - 11:53:35 PDT