Re: htdig: Question about HTML parser.


Jongpil Won (flute@doongji.com)
Mon, 9 Nov 1998 19:13:03 +0900 (KST)


dear HtDig developer and user...

On Wed, 4 Nov 1998, Geoff Hutchison wrote:

> At 1:56 AM -0500 10/30/98, Jongpil Won wrote:
> >1. &lt;img&gt; <a href="view.html">view</a><br>
> >
> >but, ht://Dig do not digging view.html.
> >and, I use verbose mode,
> >'Tag: img&gt; <a href="view.html"> matched 18'
> >is displayed.
> >so, I think this is bug,
>
> I would agree that is a bug. However, a good chunk of the HTML parser was
> revised in version 3.1.0b2 and this eliminated the code you patched.
>
> Have you tried version 3.1.0b2? Do you still see this problem?
>

I checked version 3.1.0b2 HTML parser,
but, still remained bug.

In 3.1.0b2 HTML parser,
first of parsing, change all SGML Entities to ASCII text.
It means that it changes all "&lt;" to "<" and "&gt;" to ">",
In above case, That's no problem.

BUT
case 1:
"&lt;1. <a href="view.html">view</a> &gt;"
changed like
-> "<1. <a href="view.html">view</a>>"
so, do not parse view.html file.
-> parser got "<1. <a href="view.html">", and do not processing HTML tag.

case 2:
"<B>&lt;it is important word&gt;</B>"
changed like
-> "<B><it is important word></B>"
so, do not add "it is important word" to word list.

> Thanks for your report,
>
> -Geoff Hutchison
> Williams Students Online
> http://wso.williams.edu/
>
>
>

                                 vV^Vv
                                / O O \
                               o| | |o
                                 \ - /
-------------------------- oOOO ------- OOOo -----------------------
        Jongpil Won Doongji
        Director of Development OA, Internet, Java, Linux
TEL: (+82)-2-3789-1596 (+82)-2-3789-1596
FAX: (+82)-2-703-9374 (+82)-2-703-9374
EMAIL: Jongpil.Won@doongji.com root@doongji.com
URL: http://www.doongji.com/~flute http://www.doongji.com/
ICQ: #19768749
--------------------------------------------------------------------
                          ooooO Ooooo

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-request@sdsu.edu containing the single word "unsubscribe" in
the body of the message.



This archive was generated by hypermail 2.0b3 on Sat Jan 02 1999 - 16:28:46 PST