Like every other nerd online, I’ve been playing with Bluesky’s firehose, which emits a stream of all events on the social network.
It turns out that the easiest way to get started is to avoid the raw firehose and instead use Bluesky’s Jetstream service, which emits events in a simple JSON format rather than the complex format used by the firehose itself.
A few lines of Python is all you need:
import json
from httpx_ws import connect_ws
BSKY_JETSTREAM = "wss://jetstream1.us-west.bsky.network/subscribe"
with connect_ws(BSKY_JETSTREAM) as ws:
while True:
print(ws.receive_text())
I’ve taken these few lines and wrapped them in a handy script, jetstream.py
, that you can grab directly from GitHub.
It’s easy to run the script! The only thing you need to install is Astral’s UV. There’s no need to muck about with Python versions or packaging. (Here’s the magic that makes this work.)
To print everything that’s happening with Bluesky:
$ ./jetstream.py
# ... so much output!
The script offers some handy features. Using the --collection
option, you can filter down to specific types of events. For instance, if you only want to see posts:
$ ./jetstream.py --collection app.bsky.feed.post
Or new follows:
$ ./jetstream.py --collection app.bsky.graph.follow
Because the script emits a stream of JSON objects, one per line, it’s easy to pipe the output to other commands for further processing:
# Only print the text of each post
$ ./jetstream.py --collection app.bsky.feed.post | jq -r '.commit.record.text'
# Only print posts that mention "skeet"
$ ./jetstream.py --collection app.bsky.feed.post | grep -i skeet --line-buffered
Here’s a bit of command-line nerdery to extract YouTube links from posts:
# You can do this with only grep, or only jq, but this is a nice mix of both:
$ ./jetstream.py --collection app.bsky.feed.post | \
jq -r ".commit.record.embed.external.uri" | \
grep -oE 'https://www\.youtube\.com/watch\?v=[a-zA-Z0-9_-]+' --line-buffered
You can also filter down to actions taken by specific decentralized identifiers (DIDs):
# Filter to only actions taken by the official Bluesky (@bsky.app) account
$ ./jetstream.py --did did:plc:z72i7hdynmk6r22z27h6tvur
Of course, working directly with DIDs is pretty cumbersome. You can use the --handle
option to filter by Bluesky handles instead:
# Filter to only actions taken by the official Bluesky (@bsky.app) account
$ ./jetstream.py --handle @bsky.app
You can specify up to 10,000 handles in a single command.
(If you’re interested, I’ve also written some notes on how to convert between DIDs and Bluesky handles with Python.)
I hope this script is helpful! If you have any questions or suggestions, please let me know.