[htdig3-dev] URL case sensitivity -- doing it the right way.


Subject: [htdig3-dev] URL case sensitivity -- doing it the right way.
From: Patrick (patrick@aol2.com)
Date: Wed Jul 19 2000 - 12:55:16 PDT


I've been battling with the case_sensitive issue for a while now. It
seems that by declaring "case_sensitive: false" will automatically
lowercase the URLs (performed in ../htlib/URL.cc). This seems like
a great idea, however, I think a more logical procedure would be to
not automatically lowercase the URL from the get-go and only lower
case the URL temporarily when performing comparisons to previously
crawled/queued URLs.

Basically, what is happening is that the university's web server uses
Apache's mod_mispel. Upon a URL case sensitivity mis-match (ex:
http://www.foo.com/DOCUMENT is the request, but http://www.foo.com/document
is the true document name), the module will send an automatic
301 Moved Permanently message -- a message that htdig does NOT follow,
regardless of the case_sensitive argument.

Long story short: where/how can the code be modified so that the actual
URL is NOT lowercased automatically, but rather, is only lowercased
temporarily when doing a comparison to other queued/crawled URLs
(which will also be temporarily lowercased during the comparison
process)?

Thanks,
Patrick

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
htdig3-dev-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Jul 19 2000 - 02:55:43 PDT