Subject: Re: [htdig] htdig parsing From: Gilles Detillieux (grdetil@scrc.umanitoba.ca) Date: Fri Oct 06 2000 - 08:01:10 PDT Next message: Gilles Detillieux: "Re: [htdig] server_aliases" Previous message: Malcolm Austen: "Re: [htdig] server_aliases" In reply to: Rzepa, Henry: "[htdig] htdig parsing <object>" Next in thread: Geoff Hutchison: "Re: [htdig] htdig parsing <object>" Reply: Gilles Detillieux: "Re: [htdig] htdig parsing <object>" Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] According to Rzepa, Henry: > <object> solves both these problems. > > Our only problem is that htdig 3.2 does not parse object. > > A long time ago, we hacked htdig 3.1 to parse <embed> and > <object>, but these mods do not appear to have been incorporated > into htdig 3.2. htdig 3.2 does indeed parse both <embed> and <object> tags, and in fact parses them both the same way, just as it does for <frame> tags. It looks for a "src=" parameter in either, and passes it to got_href() to queue up the link. I'm not sure how you hacked 3.2 to handle them, but the distributed 3.1.5 code parses them in much the same way as 3.2. The only major difference, other than some subtle internal differences in the parsing code, is that the current 3.2 development code (post 3.2.0b2) also looks for a "title=" parameter in the embed, object or frame tag, and uses it as the link description text. > If someone could rescue them, we would be very grateful. > On this point, if htdig could also be persuaded to index the > title attribute of elements such as <object> it would be a great > help. As part of the xhtml conversion process, we build a title > if none exists, and it would be nice to have htdig pick it up! 3.2.0b3 will, earlier releases don't. You can change 3.2.0b2 or b1 to do likewise by finding the code in do_tag() that handles object tags, and changing the first 0 argument to got_href() to transSGML(attrs["title"]). (3.2.0b3 will also handle the title parameter in <a href...> tags, and index both the title and the anchor text as two separate descriptions for the link.) Perhaps you could show us an example of how you use the <object> tag, as I'm having difficulty seeing why the 3.2 code isn't working for you. Have you tried htdig -vvvvv to see what it does when it comes to one of these tags? -- Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html> Next message: Gilles Detillieux: "Re: [htdig] server_aliases" Previous message: Malcolm Austen: "Re: [htdig] server_aliases" In reply to: Rzepa, Henry: "[htdig] htdig parsing <object>" Next in thread: Geoff Hutchison: "Re: [htdig] htdig parsing <object>" Reply: Gilles Detillieux: "Re: [htdig] htdig parsing <object>" Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] This archive was generated by hypermail 2b28 : Fri Oct 06 2000 - 08:05:17 PDT
According to Rzepa, Henry: > <object> solves both these problems. > > Our only problem is that htdig 3.2 does not parse object. > > A long time ago, we hacked htdig 3.1 to parse <embed> and > <object>, but these mods do not appear to have been incorporated > into htdig 3.2.
htdig 3.2 does indeed parse both <embed> and <object> tags, and in fact parses them both the same way, just as it does for <frame> tags. It looks for a "src=" parameter in either, and passes it to got_href() to queue up the link. I'm not sure how you hacked 3.2 to handle them, but the distributed 3.1.5 code parses them in much the same way as 3.2.
The only major difference, other than some subtle internal differences in the parsing code, is that the current 3.2 development code (post 3.2.0b2) also looks for a "title=" parameter in the embed, object or frame tag, and uses it as the link description text.
> If someone could rescue them, we would be very grateful. > On this point, if htdig could also be persuaded to index the > title attribute of elements such as <object> it would be a great > help. As part of the xhtml conversion process, we build a title > if none exists, and it would be nice to have htdig pick it up!
3.2.0b3 will, earlier releases don't. You can change 3.2.0b2 or b1 to do likewise by finding the code in do_tag() that handles object tags, and changing the first 0 argument to got_href() to transSGML(attrs["title"]). (3.2.0b3 will also handle the title parameter in <a href...> tags, and index both the title and the anchor text as two separate descriptions for the link.)
Perhaps you could show us an example of how you use the <object> tag, as I'm having difficulty seeing why the 3.2 code isn't working for you. Have you tried htdig -vvvvv to see what it does when it comes to one of these tags?
-- Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
This archive was generated by hypermail 2b28 : Fri Oct 06 2000 - 08:05:17 PDT