Neil Mansilla (neil@aol2.com)
Wed, 16 Jun 1999 17:41:40 -0400
Can someone help me identify the file and subroutine that
is the FIRST to see and strip out the HREFs? I think that
this is the best place to lowercase a URL, before it gets
any further in the spidering process..
At that location, we'll check the conf["case_sensitive"]
(or pass that value to that subroutine) and IMMEDIATELY
lowercase the URL(s).
>> I found that if I uncommented each line in DocumentDB.cc
>> that contains "url.lowercase()", htdig's verbose report
>> still looks like this:
>
>I think this will still be necessary.
>
>> I would like to avoid any uppercase representation all
>> together if case_sensitive = false;
>
>I think you'll need to do this in Retriever.cc, probably for the Need2Get
>portion.
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.
This archive was generated by hypermail 2.0b3 on Wed Jun 16 1999 - 13:59:15 PDT