Re: [htdig] cgi-bin meta information and htdig

Geoff Hutchison (
Wed, 29 Sep 1999 07:28:34 -0500

At 11:28 AM +0100 9/29/99, Rzepa, Henry wrote:
>Actually, on this final point, can someone let me know whether htdig
>exactly conforms to any W3 spec of HTML? For example, does it
>track TITLE attributes in the various elements that have them, eg
><object> etc etc.

Sadly, no. If anything, it's probably pretty close to HTML 2.0. For
one, no one has made much of an effort to check which tags need to be
added to the HTML parser. For another, the later standards (esp. 4.0)
are very flexible in terms of metadata. This is nice, but it makes it
very hard for an indexer like htdig.

For example, you could include metadata about a document in another
URL entirely:

<LINK REL="author" HREF="author.html">

For compliance, it would need to download the nested metadata.

It would obviously be very useful to go through the HTML parser and
compare it to the HTML 4.0 standard. I would recommend doing this in
the 3.2 code since the parser is a little cleaner and easier to add
new tags.

(To answer your question, it would be pretty easy to add TITLE
parsing for <a href> tags.)

-Geoff Hutchison
Williams Students Online

