Re: [htdig] Identifying non-indexed URLs


Subject: Re: [htdig] Identifying non-indexed URLs
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Tue Mar 14 2000 - 07:50:46 PST


On Tue, 14 Mar 2000, Bigler, Tyson MT SSI wrote:

> knowing which URLs were seen but not indexed because they weren't
> "parsable". Is this easily done?

I'm not quite sure what you mean. I'm assuming you want some listing of
URLs included in <a href="..."></a> tags that are malformed?

For better or worse, the URL-parsing code doesn't reject malformed URLs.
So you should see them rejected by the normal means. Granted, I haven't
run it through every possible URL-ish input (malformed or not), so it's
possible there are bugs.

Remember, if you want to take a look at every URL seen, you can set
create_url_list.

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Tue Mar 14 2000 - 07:56:13 PST