App Engine’s Python documentation is sometimes fuzzy on details around caching and cache headers. Here’s a quick overview for those new to the platform. App Engine apps can serve a mix of static and dynamic content. For each, I’ll discuss App Engine’s default behavior and then describe levers that you can pull to customize your app’s caching behavior.
App Engine apps are just CGI scripts. Let’s start with the simplest possible:
(To get going, App Engine also needs an
app.yaml configuration file, one section of which maps URLs to scripts. See the App Engine Hello World Tutorial for details.)
After uploading our app to Google’s production servers, we use
curl -D to see what headers App Engine adds to responses:
HTTP/1.1 200 OK Content-Type: text/plain Vary: Accept-Encoding Date: Mon, 24 Oct 2011 21:41:39 GMT Server: Google Frontend Cache-Control: private Transfer-Encoding: chunked
Cache-Control header, which we of course didn’t emit in our script. According to RFC2616, a value of
indicates that all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache. A private (non-shared) cache MAY cache the response.
In HTTP parlance, a shared cache is one that multiple users can access. So, by default, the headers added by App Engine’s runtime indicate that it’s okay to cache content so long as only one user can access the cache. App Engine adds the same
Cache-Control: private header regardless of whether the request is HTTP or HTTPS.
In addition to the
Cache-Control header, App Engine also sneaks a
Vary: Accept-Encoding header into the mix. The RFC has a bit to say about this, too:
The Vary field value indicates the set of request-header fields that fully determines, while the response is fresh, whether a cache is permitted to use the response to reply to a subsequent request without revalidation.
In other words, by default, App Engine responses (provided they’re fresh) can be re-acquired directly from caches so long as the
Accept-Encoding header hasn’t changed. This seems like a safe default: if a browser needs a different encoding, it won’t want the same response.
What to do when you don’t want the default caching behavior? App Engine allows developers to provide their own
Cache-Control header. For example, we can prevent caching entirely:
A quick trip to
curl -D confirms this behaves as expected.
But what about the
App Engine doesn’t quite let us get away with this. The returned
Vary header includes both
Accept-Encoding. This is true even if you use the special
Vary: * form.
Many of my own App Engine apps aren’t web sites; they’re REST-ful APIs for which
Cache-Control: private is not always the right choice. I’m a Django kinda guy, so I often write my own middleware to ensure the right caching headers make their way into my API responses.
One last detail of note: there is an App Engine bug related to the other side of the
Vary header. App Engine allows you to download content from the web via standard Python libraries. The App Engine edge cache currently ignores the
Vary header in downloaded responses. This is in violation of the RFCs and can lead to cache poisoning. For certain apps, I imagine that issue 4277 is a show-stopper. Alas: if history is any guide, and despite the severity of the bug, I wouldn’t expect a fix any time soon…
App Engine allows you to upload a small amount of static content along with your application. Static content is served through a fast (and less financially expensive) path that doesn’t need to touch application code.
To include static content, simply add static file or directory callouts to the
handlers section of your
app.yaml. (See the application configuration documentation for details.)
Let’s fire up
curl -D again and check out the headers returned for a sample static file:
HTTP/1.1 200 OK ETag: "anYKQA" Date: Tue, 25 Oct 2011 00:42:26 GMT Expires: Tue, 25 Oct 2011 00:52:26 GMT Cache-Control: public, max-age=600 Content-Type: text/plain Server: Google Frontend Transfer-Encoding: chunked
For static content, we see that App Engine provides a
Cache-Control response (so any cache may hold on to the returned resource) with a short lifespan of ten minutes. (Again, this is true for both HTTP and HTTPS service.) Thankfully, an
ETag is also included in the mix.
App Engine gives you both app-wide and fine-grained control over the expiration times for static content. This is configured in your
app.yaml. To set an application-wide expiration time, use the
default_expiration property, like so:
default_expiration: "7d". Or, if you’d like to specify for a specific
static_directory, you can simply hang an
expiration property off of each. (See the app configuration documentation for details.)
ETags on push, most clients won’t make use of this to invalidate the cache.
For typical web apps, static content is referenced by dynamic content, so I’ve developed a workable system to change references to static content after every push. Basically, I’ve created a new Django template tag,
static_url, that appends a cache buster query parameter to referenced URLs. This isn’t really cache invalidation — instead, we’re handing out entirely new URLs! The effect is the same. I set the buster to the value of the
CURRENT_VERSION_ID environment variable provided by the App Engine runtime; the value changes on every push. I built this for Django but the equivalent should be easy to string together in your favorite framework.