Re: [htdig] Remove_bad_urls ?!


Subject: Re: [htdig] Remove_bad_urls ?!
From: Geoff Hutchison (ghutchis@wso.williams.edu)
Date: Wed Apr 26 2000 - 06:26:37 PDT


At 11:50 AM +0000 4/26/00, bs@hi.is wrote:
>I posted a message here a while back about receiving what seemed to be
>random bad_urls (which all were bad urls indeed) in the errors to take note
>of in htdig's report. I got the response to put a remove_bad_urls: false

Hmm. I would have told you the reverse. If these seem to be "random"
URLs, then you probably want to get rid of them through htmerge.

The section you mention includes typos and the like, so it's often
very strange indeed.

>Shouldn't I be receiving all the bad urls every time I do a update since
>remove_bad_urls is set to false and has been ever since I did the initial

No, not really. An update dig simply goes through all the URLs it has
in the database and checks to see if they've changed. So if there's a
bad URL in the database, it will hit it and throw up its hands and
move to the next one. The page that had the original typo may not
have changed, so you won't see the report again (since it won't be
re-parsed).

>Ref:'s are also getting lost for bad urls .. can anyone explain why that
>happens ?

As above. Since it doesn't re-parse the files, it doesn't know what
the referring page was.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Apr 26 2000 - 04:16:42 PDT