Subject: [htdig] htdig parsing From: Rzepa, Henry (h.rzepa@ic.ac.uk) Date: Fri Oct 06 2000 - 05:21:44 PDT Next message: Rivera, Tony: "RE: [htdig] Can't get my search to update correctly.." Previous message: Rzepa, Henry: "[htdig] Record h for external parsers: Passing links to the excerpt" Next in thread: Gilles Detillieux: "Re: [htdig] htdig parsing <object>" Reply: Gilles Detillieux: "Re: [htdig] htdig parsing <object>" Reply: Geoff Hutchison: "Re: [htdig] htdig parsing <object>" Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] We are extensively converting our "legacy" html to xhtml, using a combination of htdig, JTidy and locally written JChemTidy. One element we have focused on much is <object>. We replace all instances of <embed> by< object>, because a) <embed> is not well formed (ie it should be <embed /> b) it is not validatable. This is because the attributes of <embed> are not defined by a DTD, but are instead implicit in whatever attributes the plugin that <embed> resolves to supports. Thus two users with different plugins may well be running implicitly different DTDs for their document. This is not good. <object> solves both these problems. Our only problem is that htdig 3.2 does not parse object. A long time ago, we hacked htdig 3.1 to parse <embed> and <object>, but these mods do not appear to have been incorporated into htdig 3.2. If someone could rescue them, we would be very grateful. On this point, if htdig could also be persuaded to index the title attribute of elements such as <object> it would be a great help. As part of the xhtml conversion process, we build a title if none exists, and it would be nice to have htdig pick it up! Thanks. -- Henry Rzepa. +44 (0)20 7594 5774 (Office) +44 (0)20 7594 5804 (Fax) Dept. Chemistry, Imperial College, London, SW7 2AY, UK. http://www.ch.ic.ac.uk/rzepa/ ------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html> Next message: Rivera, Tony: "RE: [htdig] Can't get my search to update correctly.." Previous message: Rzepa, Henry: "[htdig] Record h for external parsers: Passing links to the excerpt" Next in thread: Gilles Detillieux: "Re: [htdig] htdig parsing <object>" Reply: Gilles Detillieux: "Re: [htdig] htdig parsing <object>" Reply: Geoff Hutchison: "Re: [htdig] htdig parsing <object>" Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] This archive was generated by hypermail 2b28 : Fri Oct 06 2000 - 05:26:35 PDT
We are extensively converting our "legacy" html to xhtml, using a combination of htdig, JTidy and locally written JChemTidy.
One element we have focused on much is <object>. We replace all instances of <embed> by< object>, because
a) <embed> is not well formed (ie it should be <embed /> b) it is not validatable. This is because the attributes of <embed> are not defined by a DTD, but are instead implicit in whatever attributes the plugin that <embed> resolves to supports. Thus two users with different plugins may well be running implicitly different DTDs for their document. This is not good.
<object> solves both these problems.
Our only problem is that htdig 3.2 does not parse object.
A long time ago, we hacked htdig 3.1 to parse <embed> and <object>, but these mods do not appear to have been incorporated into htdig 3.2.
If someone could rescue them, we would be very grateful. On this point, if htdig could also be persuaded to index the title attribute of elements such as <object> it would be a great help. As part of the xhtml conversion process, we build a title if none exists, and it would be nice to have htdig pick it up!
Thanks.
-- Henry Rzepa. +44 (0)20 7594 5774 (Office) +44 (0)20 7594 5804 (Fax) Dept. Chemistry, Imperial College, London, SW7 2AY, UK. http://www.ch.ic.ac.uk/rzepa/ ------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
Henry Rzepa. +44 (0)20 7594 5774 (Office) +44 (0)20 7594 5804 (Fax) Dept. Chemistry, Imperial College, London, SW7 2AY, UK. http://www.ch.ic.ac.uk/rzepa/
------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
This archive was generated by hypermail 2b28 : Fri Oct 06 2000 - 05:26:35 PDT