A Word on App Engine Caching

App Engine’s Python documentation is sometimes fuzzy on details around caching and cache headers. Here’s a quick overview for those new to the platform. App Engine apps can serve a mix of static and dynamic content. For each, I’ll discuss App Engine’s default behavior and then describe levers that you can pull to customize your app’s caching behavior.

Dynamic Content

App Engine apps are just CGI scripts. Let’s start with the simplest possible:

print 'Content-Type: text/plain'
print ''
print 'Hello, world!'

(To get going, App Engine also needs an app.yaml configuration file, one section of which maps URLs to scripts. See the App Engine Hello World Tutorial for details.)

After uploading our app to Google’s production servers, we use curl -D to see what headers App Engine adds to responses:

HTTP/1.1 200 OK
Content-Type: text/plain
Vary: Accept-Encoding
Date: Mon, 24 Oct 2011 21:41:39 GMT
Server: Google Frontend
Cache-Control: private
Transfer-Encoding: chunked

Notice the Cache-Control header, which we of course didn’t emit in our script. According to RFC2616, a value of private

indicates that all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache. A private (non-shared) cache MAY cache the response.

In HTTP parlance, a shared cache is one that multiple users can access. So, by default, the headers added by App Engine’s runtime indicate that it’s okay to cache content so long as only one user can access the cache. App Engine adds the same Cache-Control: private header regardless of whether the request is HTTP or HTTPS.

In addition to the Cache-Control header, App Engine also sneaks a Vary: Accept-Encoding header into the mix. The RFC has a bit to say about this, too:

The Vary field value indicates the set of request-header fields that fully determines, while the response is fresh, whether a cache is permitted to use the response to reply to a subsequent request without revalidation.

In other words, by default, App Engine responses (provided they’re fresh) can be re-acquired directly from caches so long as the Accept-Encoding header hasn’t changed. This seems like a safe default: if a browser needs a different encoding, it won’t want the same response.

What to do when you don’t want the default caching behavior? App Engine allows developers to provide their own Cache-Control header. For example, we can prevent caching entirely:

print 'Content-Type: text/plain'
print 'Cache-Control: no-cache'
print ''
print 'Hello, world!'

A quick trip to curl -D confirms this behaves as expected.

But what about the Vary header?

print 'Content-Type: text/plain'
print 'Cache-Control: no-cache'
print 'Vary: X-Something'
print ''
print 'Hello, world!'

App Engine doesn’t quite let us get away with this. The returned Vary header includes both X-Something and Accept-Encoding. This is true even if you use the special Vary: * form.

Many of my own App Engine apps aren’t web sites; they’re REST-ful APIs for which Cache-Control: private is not always the right choice. I’m a Django kinda guy, so I often write my own middleware to ensure the right caching headers make their way into my API responses.

One last detail of note: there is an App Engine bug related to the other side of the Vary header. App Engine allows you to download content from the web via standard Python libraries. The App Engine edge cache currently ignores the Vary header in downloaded responses. This is in violation of the RFCs and can lead to cache poisoning. For certain apps, I imagine that issue 4277 is a show-stopper. Alas: if history is any guide, and despite the severity of the bug, I wouldn’t expect a fix any time soon…

Static Content

App Engine allows you to upload a small amount of static content along with your application. Static content is served through a fast (and less financially expensive) path that doesn’t need to touch application code.

To include static content, simply add static file or directory callouts to the handlers section of your app.yaml. (See the application configuration documentation for details.)

Let’s fire up curl -D again and check out the headers returned for a sample static file:

HTTP/1.1 200 OK
ETag: "anYKQA"
Date: Tue, 25 Oct 2011 00:42:26 GMT
Expires: Tue, 25 Oct 2011 00:52:26 GMT
Cache-Control: public, max-age=600
Content-Type: text/plain
Server: Google Frontend
Transfer-Encoding: chunked

For static content, we see that App Engine provides a public Cache-Control response (so any cache may hold on to the returned resource) with a short lifespan of ten minutes. (Again, this is true for both HTTP and HTTPS service.) Thankfully, an ETag is also included in the mix.

App Engine gives you both app-wide and fine-grained control over the expiration times for static content. This is configured in your app.yaml. To set an application-wide expiration time, use the default_expiration property, like so: default_expiration: "7d". Or, if you’d like to specify for a specific static_file or static_directory, you can simply hang an expiration property off of each. (See the app configuration documentation for details.)

This sort of control over expiration times is handy, but I often need to go one step further. In my experience, new App Engine code often implies changes to my static javascript, css, and image resources. I want my static content to invalidate when I push new code to App Engine. While App Engine updates its ETags on push, most clients won’t make use of this to invalidate the cache.

For typical web apps, static content is referenced by dynamic content, so I’ve developed a workable system to change references to static content after every push. Basically, I’ve created a new Django template tag, static_url, that appends a cache buster query parameter to referenced URLs. This isn’t really cache invalidation — instead, we’re handing out entirely new URLs! The effect is the same. I set the buster to the value of the CURRENT_VERSION_ID environment variable provided by the App Engine runtime; the value changes on every push. I built this for Django but the equivalent should be easy to string together in your favorite framework.