Subject: Re: [htdig] ... but not changed
From: David Adams (D.J.Adams@soton.ac.uk)
Date: Wed Oct 04 2000 - 06:55:58 PDT
>
> On Tue, 3 Oct 2000, David Adams wrote:
>
> > When, during an update run, htdig says of a page: "retrieved but not
> > changed", how does htdig decide that the page is the same as the last time?
>
> It checks the date it received from the server (if present) against the
> date in the database. If they're the same, it ignores the file.
>
> > An author is maintaining that she added a link to a page and that an update
> > run of htdig failed to follow the new link(s) she had added.
>
> Are these static or dynamic pages? If the server is not returning
> Last-Modified headers, then this could be the problem.
>
> --
> -Geoff Hutchison
> Williams Students Online
> http://wso.williams.edu/
Not dynamic in the true sense, but SSI on an Apache server. Another
reply to this list gave the vital information:
> No, the XBitHack turns .html files with execute permission into SSI
> files (equivalent to .shtml), and for SSI files, Apache does NOT put
> out a Last-Modified header because SSI generates dynamic content.
It had not occured to me that an SSI file was "dynamic", I live and learn!
This explains why a significant fraction of pages on our principal server
generate the "retrieved but not changed message". Just as well we re-index
completely once a week!
I will add
modification_time_is_now: true
to the configuration file and that should fix the problem.
Thanks again to both you and Gilles for your replies.
-- David J Adams <D.J.Adams@soton.ac.uk> Computing Services University of Southampton------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>
This archive was generated by hypermail 2b28 : Wed Oct 04 2000 - 06:59:50 PDT