davepeck.org

Siri is a starkly limited, deeply impressive 1.0.

As with every first version that Apple ships, Siri nails a handful of pragmatic use-cases and leaves the rest for a later day. Siri’s language model is flexible enough to handle unencumbered speech. The few tasks that Siri performs well are typically much faster to perform via speech than via touch.

Apple got enough right that I find myself using Siri just about every day. Good uses include: getting public transit directions, creating a reminder on the go, setting a timer in the kitchen, and impressing geeky friends with queries that I know will get sent to Wolfram Alpha.

I’ve encountered a few weaknesses in 1.0. Siri sometimes has trouble recognizing proper nouns; try as I might, I can’t get bus directions to Vivace Coffee in Seattle or Xoco Restaurant in Chicago. Map queries are restricted to my current location; it’s not possible to ask for venues near a place I’ll be soon. Scheduling appointments and reminders is fine provided I don’t try to schedule interesting recurrences (“Remind me to run every other day.”)

It’s interesting to ponder what a developer-pluggable API for Siri might look like. There appear to be three steps in the Siri pipeline: speech-to-text, intent interpretation, and action. The first two steps are performed in Apple’s cloud. (At peak times, Siri fails with a generic error that suggests it hasn’t even performed speech-to-text.) The third step is performed wherever appropriate: cloud or device. It seems to me that all three steps need to be pluggable. Today, speech-to-text is parameterized with contacts; it’s probably also biased by current location. I don’t know much about Siri’s underlying model, so for me the “intent interpretation” step is the most nebulous and intriguing. And action is interesting in part because, at the moment, Siri has custom tentacles into both key iOS apps (Reminders, Timer, etc) and web services (Yelp, Alpha).