A Word on App Engine Caching
App Engine’s Python documentation is sometimes fuzzy on details around caching and cache headers. Here’s a quick overview for those new to the platform. App Engine apps can serve a mix of static and dynamic content. For each, I’ll discuss App Engine’s default behavior and then describe levers that you can pull to customize your app’s caching behavior.
Dynamic Content
App Engine apps are just CGI scripts. Let’s start with the simplest possible:
print 'Content-Type: text/plain'
print ''
print 'Hello, world!'
(To get going, App Engine also needs an app.yaml
configuration file, one
section of which maps URLs to scripts. See the
App Engine Hello World Tutorial
for details.)
After uploading our app to Google’s production servers, we use curl -D
to see
what headers App Engine adds to responses:
HTTP/1.1 200 OK
Content-Type: text/plain
Vary: Accept-Encoding
Date: Mon, 24 Oct 2011 21:41:39 GMT
Server: Google Frontend
Cache-Control: private
Transfer-Encoding: chunked
Notice the Cache-Control
header, which we of course didn’t emit in our script.
According to
RFC2616, a
value of private
indicates that all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache. A private (non-shared) cache MAY cache the response.
In HTTP parlance, a shared cache is one that multiple users can access. So, by
default, the headers added by App Engine’s runtime indicate that it’s okay to
cache content so long as only one user can access the cache. App Engine adds the
same Cache-Control: private
header regardless of whether the request is HTTP
or HTTPS.
In addition to the Cache-Control
header, App Engine also sneaks a
Vary: Accept-Encoding
header into the mix. The RFC has a
bit to say about this,
too:
The Vary field value indicates the set of request-header fields that fully determines, while the response is fresh, whether a cache is permitted to use the response to reply to a subsequent request without revalidation.
In other words, by default, App Engine responses (provided they’re fresh) can be
re-acquired directly from caches so long as the Accept-Encoding
header hasn’t
changed. This seems like a safe default: if a browser needs a different
encoding, it won’t want the same response.
What to do when you don’t want the default caching behavior? App Engine allows
developers to provide their own Cache-Control
header. For example, we can
prevent caching entirely:
print 'Content-Type: text/plain'
print 'Cache-Control: no-cache'
print ''
print 'Hello, world!'
A quick trip to curl -D
confirms this behaves as expected.
But what about the Vary
header?
print 'Content-Type: text/plain'
print 'Cache-Control: no-cache'
print 'Vary: X-Something'
print ''
print 'Hello, world!'
App Engine doesn’t quite let us get away with this. The returned Vary
header
includes both X-Something
and Accept-Encoding
. This is true even if you
use the special Vary: *
form.
Many of my own App Engine apps aren’t web sites; they’re REST-ful APIs for which
Cache-Control: private
is not always the right choice. I’m a Django kinda guy,
so I often write my own middleware to ensure the right caching headers make
their way into my API responses.
One last detail of note: there is an App Engine bug related to the other side
of the Vary
header. App Engine allows you to download content from the web via
standard Python libraries. The App Engine edge cache currently ignores the
Vary
header in downloaded responses. This is in violation of the RFCs and can
lead to cache poisoning. For certain apps, I imagine that
issue 4277 is
a show-stopper. Alas: if history is any guide, and despite the severity of the
bug, I wouldn’t expect a fix any time soon…
Static Content
App Engine allows you to upload a small amount of static content along with your application. Static content is served through a fast (and less financially expensive) path that doesn’t need to touch application code.
To include static content, simply add static file or directory callouts to the
handlers
section of your app.yaml
. (See the
application configuration
documentation for details.)
Let’s fire up curl -D
again and check out the headers returned for a sample
static file:
HTTP/1.1 200 OK
ETag: "anYKQA"
Date: Tue, 25 Oct 2011 00:42:26 GMT
Expires: Tue, 25 Oct 2011 00:52:26 GMT
Cache-Control: public, max-age=600
Content-Type: text/plain
Server: Google Frontend
Transfer-Encoding: chunked
For static content, we see that App Engine provides a public
Cache-Control
response (so any cache may hold on to the returned resource) with a short
lifespan of ten minutes. (Again, this is true for both HTTP and HTTPS service.)
Thankfully, an
ETag
is
also included in the mix.
App Engine gives you both app-wide and fine-grained control over the expiration
times for static content. This is configured in your app.yaml
. To set an
application-wide expiration time, use the default_expiration
property, like
so: default_expiration: "7d"
. Or, if you’d like to specify for a specific
static_file
or static_directory
, you can simply hang an expiration
property off of each. (See the
app configuration
documentation for details.)
This sort of control over expiration times is handy, but I often need to go one
step further. In my experience, new App Engine code often implies changes to my
static javascript, css, and image resources. I want my static content to
invalidate when I push new code to App Engine. While App Engine updates its
ETag
s on push, most clients won’t make use of this to invalidate the cache.
For typical web apps, static content is referenced by dynamic content, so I’ve
developed a workable system to change references to static content after every
push. Basically, I’ve created a new Django template tag, static_url
, that
appends a cache buster query parameter to referenced URLs. This isn’t really
cache invalidation — instead, we’re handing out entirely new URLs! The
effect is the same. I set the buster to the value of the CURRENT_VERSION_ID
environment variable provided by the
App Engine runtime;
the value changes on every push. I built this for Django but the equivalent
should be easy to string together in your favorite framework.