Rzepa, Henry
Wed, 29 Sep 1999

A surprising omission in the 5 year history of indexing HTML is the
cgi-bin request. Normally, such URLs are excluded, for obvious reasons,
in the default conf file settings.

It does mean however that even their existence is not captured by the digging.

Both the 3.2 and 4.o HTML specs allow
<A TITLE=string HREF=cgi-bin request>

where at least the title can describe the type of resource accessed by the
HREF pointer to e.g. a remote database. My question is:

can the TITLE of a cgi-bin anchor be indexed easily by htdig?
It would in effect represent meta data about the resource. I am not sure other
metadata schemas (eg DC) can easily flag such information.

My next point is to observer that, amazingly, the formal 3.2 and 4.0
definitions of the <FOR> attributes do not include a title!! This means
that <FORM ACTION=cgi-=bin request> cannot have a title
atrribute. Formally, <FORM> and <A> should be equivalenced,
and if one has a title, the other should.

Actually, on this final point, can someone let me know whether htdig
exactly conforms to any W3 spec of HTML? For example, does it
track TITLE attributes in the various elements that have them, eg
<object> etc etc.

If anyone has any thoughts on how to handle meta indexing of cgi-bin
requests, please let me know.


