Re: [htdig] Problems with GET URLS


Subject: Re: [htdig] Problems with GET URLS
From: Adam Rice (adam@newsquest.co.uk)
Date: Wed Apr 12 2000 - 04:36:38 PDT


Geoff Hutchison wrote:

> http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.11
>
> AFAICT, Apache actually uses full MD5 checksums for ETag: headers on
> static files. However, the spec itself says:

Actually, (on Unix at least) the ETag: is constructed as the hexadecimal
representation of

inode-size-mtime

this is, of course, implementation dependant--a server can construct the
ETag: any way it wants as long as it satisfies the requirements of the
standard (which, as I understand it, are that changes to the document
will always cause a change to the ETag).

I would certainly be glad to have the option to use the ETag as a
measure of uniqueness, but it would have to come with a warning that the
behaviour was not guaranteed by the standard and that there's a small
probability of collisions if your webspace is spread across multiple
disk partitions.

What I really need in my situation (lots of auto-generated documents,
some of which have the same content, but are not character-for-character
copies) is a way to put an identifier in the document itself that will
be used to determine uniqueness. Preferably with the duplicates weeded
out at search time so that the "best" one can be shown in the search
results (with "best" meaning having the most similar URL to the page
htsearch was called from). This is probably too much to hope for though.

Adam Rice

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig-unsubscribe@htdig.org
You will receive a message to confirm this.



This archive was generated by hypermail 2b28 : Wed Apr 12 2000 - 02:22:55 PDT