Conditional GET and Time Precision

| | bookmark | email | 5 comments

I've just got bitten by it, so here it is: if you are implementing conditional GET and use the If-Modified-Since to check your storage for newer 'records'/'content' make sure that the time precision is the same on both sides.

Now that I said it, let me add some details. According to the RFC2616:

The semantics of the GET method change to a "conditional GET" if the request message includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range header field. A conditional GET method requests that the entity be transferred only under the circumstances described by the conditional header field(s). The conditional GET method is intended to reduce unnecessary network usage by allowing cached entities to be refreshed without requiring multiple requests or transferring data already held by the client.

I'll use a simple example to walk you through the problem: lets say you are serving an RSS feed of the latest published content. You want to make sure that you are not sending over the wire the same content over and over again if nothing new was actually published, so you decide to support conditional GET.

In order to do that you'll be setting a Last-Modified and ETag headers on the response and the well behaving aggregators/feed readers will send back these values in the form of If-Modified-Since and If-None-Match request headers.

The format of Last-Modified header specified in the RFC1123 is: Last-Modified: Fri, 10 Apr 2009 09:58:10 GMT (f.e. the Python strftime format is '%a, %d %b %Y %H:%M:%S GMT')

As you can see this timestamp has second precision and that's exactly the thing you need to pay attention to. In case your storage is using a finer grained precision for storing timestamps (miliseconds/microseconds -- which is the case for most TIMESTAMP data types,) then using the If-Modified-Since timestamp to query for newer records/content may actually result in finding items that have been previously served (their timestamp is milliseconds bigger).

So, next time you are using Last-Modified make sure that the data storage time precision is the same.

Note: You can actually use the ETag value to disambiguate this corner case, but instead of reducing the workload on your server you'll just improve on the network transfer aspect.

Note: You might be tempted to set the Last-Modified to the actual time of serving the request. I don't think this is a good idea as changing the value for each query will not allow your database to cache the results.