I've just got bitten by it, so here it is: if you are implementing conditional GET and use the If-Modified-Since to check your storage for newer 'records'/'content' make sure that the time precision is the same on both sides.
Now that I said it, let me add some details. According to the RFC2616:
The semantics of the GET method change to a "conditional GET" if the request message includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range header field. A conditional GET method requests that the entity be transferred only under the circumstances described by the conditional header field(s). The conditional GET method is intended to reduce unnecessary network usage by allowing cached entities to be refreshed without requiring multiple requests or transferring data already held by the client.
I'll use a simple example to walk you through the problem: lets say you are serving an RSS feed of the latest published content. You want to make sure that you are not sending over the wire the same content over and over again if nothing new was actually published, so you decide to support conditional GET.
In order to do that you'll be setting a Last-Modified
and ETag
headers on the response and the well behaving aggregators/feed readers will send back these values in the form of If-Modified-Since
and If-None-Match
request headers.
The format of Last-Modified
header specified in the RFC1123 is: Last-Modified: Fri, 10 Apr 2009 09:58:10 GMT
(f.e. the Python strftime
format is '%a, %d %b %Y %H:%M:%S GMT'
)
As you can see this timestamp has second precision and that's exactly the thing you need to pay attention to. In case your storage is using a finer grained precision for storing timestamps (miliseconds/microseconds -- which is the case for most TIMESTAMP
data types,) then using the If-Modified-Since
timestamp to query for newer records/content may actually result in finding items that have been previously served (their timestamp is milliseconds bigger).
So, next time you are using Last-Modified
make sure that the data storage time precision is the same.
Note: You can actually use the ETag
value to disambiguate this corner case, but instead of reducing the workload on your server you'll just improve on the network transfer aspect.
Note: You might be tempted to set the Last-Modified
to the actual time of serving the request. I don't think this is a good idea as changing the value for each query will not allow your database to cache the results.
5 comments:
Good point. There may be less of a danger in using Last-Modified for conditional GETs than for PUTs and DELETEs. ETags are essential for writes.
Why not use Cache-control: max-age=... ?
My understanding is that both Cache-Control and Expires are directives for caching layers. Basically, both will instruct the intermediary caches to serve the local resource until the max-age is reached and so the server will not be able to send out fresh content until that moment.
The Last-Modified and ETag will allow the request to reach the server and so you'll be able to send out the new content/records as soon as these are becoming available.
In some cases it is pretty difficult to estimate the time new content/records are becoming
available so you set a Cache-Control/Expires.
I'd say that for timely content/records Last-Modified/ETag combination is a better fit, while
for non-timely resources Cache-Control/Expires will work beter.
What do you think?
You can (and should) combine the validation model and the expiration model - they really complement each other. E.g. a caching intermediary can deliver content it has cached as long as it isn't expired, and thereafter use validation with the backend to get, cache and deliver a fresh copy.
ETags are a better way to drive validation than using Last-Modified.
Max-age is a better way to determine expiry because of exactly the clock problems you mention.
So my suggestion is to go with ETags + max-age (+ public/private depending on the kind of information).
Stefan, it is my impression that using ETags is more difficult than Last-Modified. Most of the time computing a smart ETag will require full request processing and so the only thing you'll be able to benefit is network traffic. I do agree that this is its main purpose, but if you can find a way to also minimize the processing on your server why not doing it?
I do think that max-age is covering another aspect of the problem and so it is orthogonal to the example I've used.
Thanks a lot for your comments.
Post a Comment