Re: [htdig] cgi-bin meta information and htdig


Geoff Hutchison (ghutchis@wso.williams.edu)
Wed, 29 Sep 1999 07:28:34 -0500


At 11:28 AM +0100 9/29/99, Rzepa, Henry wrote:
>Actually, on this final point, can someone let me know whether htdig
>exactly conforms to any W3 spec of HTML? For example, does it
>track TITLE attributes in the various elements that have them, eg
><object> etc etc.

Sadly, no. If anything, it's probably pretty close to HTML 2.0. For
one, no one has made much of an effort to check which tags need to be
added to the HTML parser. For another, the later standards (esp. 4.0)
are very flexible in terms of metadata. This is nice, but it makes it
very hard for an indexer like htdig.

For example, you could include metadata about a document in another
URL entirely:

<LINK REL="author" HREF="author.html">

For compliance, it would need to download the nested metadata.

It would obviously be very useful to go through the HTML parser and
compare it to the HTML 4.0 standard. I would recommend doing this in
the 3.2 code since the parser is a little cleaner and easier to add
new tags.

(To answer your question, it would be pretty easy to add TITLE
parsing for <a href> tags.)

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word unsubscribe in
the SUBJECT of the message.



This archive was generated by hypermail 2.0b3 on Wed Sep 29 1999 - 05:36:23 PDT