Re: [htdig] ... but not changed


Subject: Re: [htdig] ... but not changed
From: David Adams (D.J.Adams@soton.ac.uk)
Date: Wed Oct 04 2000 - 06:55:58 PDT


>
> On Tue, 3 Oct 2000, David Adams wrote:
>
> > When, during an update run, htdig says of a page: "retrieved but not
> > changed", how does htdig decide that the page is the same as the last time?
>
> It checks the date it received from the server (if present) against the
> date in the database. If they're the same, it ignores the file.
>
> > An author is maintaining that she added a link to a page and that an update
> > run of htdig failed to follow the new link(s) she had added.
>
> Are these static or dynamic pages? If the server is not returning
> Last-Modified headers, then this could be the problem.
>
> --
> -Geoff Hutchison
> Williams Students Online
> http://wso.williams.edu/

Not dynamic in the true sense, but SSI on an Apache server. Another
reply to this list gave the vital information:

> No, the XBitHack turns .html files with execute permission into SSI
> files (equivalent to .shtml), and for SSI files, Apache does NOT put
> out a Last-Modified header because SSI generates dynamic content.

It had not occured to me that an SSI file was "dynamic", I live and learn!

This explains why a significant fraction of pages on our principal server
generate the "retrieved but not changed message". Just as well we re-index
completely once a week!

I will add

modification_time_is_now: true

to the configuration file and that should fix the problem.

Thanks again to both you and Gilles for your replies.

-- 
 
David J Adams
<D.J.Adams@soton.ac.uk>
Computing Services
University of Southampton

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Wed Oct 04 2000 - 06:59:50 PDT