Re: [htdig] ... but not changed


Subject: Re: [htdig] ... but not changed
From: Gilles Detillieux (grdetil@scrc.umanitoba.ca)
Date: Tue Oct 03 2000 - 12:02:29 PDT


According to David Adams:
> When, during an update run, htdig says of a page: "retrieved but not
> changed", how does htdig decide that the page is the same as the last time?
>
> An author is maintaining that she added a link to a page and that an update
> run of htdig failed to follow the new link(s) she had added.

The retrieved but not changed message occurs when the web server ignores
the "If-Modified-Since" header that htdig sends it, and sends the page
anyway, but htdig sees that the Last-Modified header contains the exact
same date it did last time the document was indexed.

I would check the modification time on the document, and if it's wrong,
correct it. You may also want to check the clock on the web server
and/or on the system where the file was edited.

Another possibility, but I'm not sure about this one, is that the server
isn't returning a Last-Modified header at all, so the DocTime field is
0 for both the old and new versions. You can confirm this by seeing if
the modification time shows up for this document in htsearch results.
It doesn't if the field is 0. If this is the case, ypu should set
modification_time_is_now to true.

-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------ To unsubscribe from the htdig mailing list, send a message to htdig-unsubscribe@htdig.org You will receive a message to confirm this. List archives: <http://www.htdig.org/mail/menu.html> FAQ: <http://www.htdig.org/FAQ.html>



This archive was generated by hypermail 2b28 : Tue Oct 03 2000 - 12:06:17 PDT