Geoff Hutchison (ghutchis@wso.williams.edu)
Sun, 20 Jun 1999 20:58:26 -0400 (EDT)
On Wed, 16 Jun 1999, Neil Mansilla wrote:
> Can someone help me identify the file and subroutine that
> is the FIRST to see and strip out the HREFs? I think that
> this is the best place to lowercase a URL, before it gets
> any further in the spidering process..
I would assume that Retriever::got_href is the first to see HREFs. Well,
not quite, clearly the HTML parser does. However, the Retriever class is
better suited for lowercasing URLs.
> At that location, we'll check the conf["case_sensitive"]
> (or pass that value to that subroutine) and IMMEDIATELY
> lowercase the URL(s).
Do you want to submit a patch?
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Sun Jun 20 1999 - 17:13:57 PDT