Bluesky's Jetstream with just a few lines of Python

Last updated: November 21, 2024

Like every other nerd online, I’ve been playing with Bluesky’s firehose, which emits a stream of all events on the social network.

It turns out that the easiest way to get started is to avoid the raw firehose and instead use Bluesky’s Jetstream service, which emits events in a simple JSON format rather than the complex format used by the firehose itself.

A few lines of Python is all you need:

import json
from httpx_ws import connect_ws 

BSKY_JETSTREAM = "wss://jetstream1.us-west.bsky.network/subscribe"

with connect_ws(BSKY_JETSTREAM) as ws:
	while True:
		print(ws.receive_text())

I’ve taken these few lines and wrapped them in a handy script, jetstream.py, that you can grab directly from GitHub.

It’s easy to run the script! The only thing you need to install is Astral’s UV. There’s no need to muck about with Python versions or packaging. (Here’s the magic that makes this work.)

To print everything that’s happening with Bluesky:

$ ./jetstream.py
# ... so much output!

The script offers some handy features. Using the --collection option, you can filter down to specific types of events. For instance, if you only want to see posts:

$ ./jetstream.py --collection app.bsky.feed.post

Or new follows:

$ ./jetstream.py --collection app.bsky.graph.follow

Because the script emits a stream of JSON objects, one per line, it’s easy to pipe the output to other commands for further processing:

# Only print the text of each post
$ ./jetstream.py --collection app.bsky.feed.post | jq -r '.commit.record.text'
# Only print posts that mention "skeet"
$ ./jetstream.py --collection app.bsky.feed.post | grep -i skeet --line-buffered

Here’s a bit of command-line nerdery to extract YouTube links from posts:

# You can do this with only grep, or only jq, but this is a nice mix of both:
$ ./jetstream.py --collection app.bsky.feed.post | \
  jq -r ".commit.record.embed.external.uri" | \
  grep -oE 'https://www\.youtube\.com/watch\?v=[a-zA-Z0-9_-]+' --line-buffered

You can also filter down to actions taken by specific decentralized identifiers (DIDs):

# Filter to only actions taken by the official Bluesky (@bsky.app) account
$ ./jetstream.py --did did:plc:z72i7hdynmk6r22z27h6tvur

Of course, working directly with DIDs is pretty cumbersome. You can use the --handle option to filter by Bluesky handles instead:

# Filter to only actions taken by the official Bluesky (@bsky.app) account
$ ./jetstream.py --handle @bsky.app

You can specify up to 10,000 handles in a single command.

(If you’re interested, I’ve also written some notes on how to convert between DIDs and Bluesky handles with Python.)

I hope this script is helpful! If you have any questions or suggestions, please let me know.