In my last post I detailed the reasons for why I wanted to achieve a distributed PDS, or at least explore if it was possible. I also explained how I managed to achieve getting the database part of the PDS to be held on a server other than the PDS and then managed to run 2 PDS instances side by side, semi successfully.

The problem I hit though was the connection to the relays. When each PDS instance sent out a request crawl to the relays, the relays would then attempt to start a web socket connection but end up getting load balanced and no guarantee that the instance requesting the crawl, was the one that got the subscribe repos request. This resulted in new posts missing from AppViews.

I tried something crazy out to get around it, but it didn't work. I gave each PDS instance a sub domain. For example one.pds404.uk and that was the hostname that I made PDS one send when requesting a crawl. I then used Caddy to route the xrpc/com.atproto.sync.subscribeRepos request for that subdomain to the first instance of the PDS. I then did the same for the other one. That did not work. I'm not 100% sure why (if someone knows better than me I'd love to know) but I feel it's because the relays didn't like subscribing to the same PDS domain more than once (even though I used subdomains).

After that I thought fuck it. You may remember me mentioning that Fig had shown me an idea they had for a firehose inverter? Well I thought I would have an attempt at making a proof of concept of that idea. There's no way I'd make a proper polished app like I'm sure they would have done, but all I'm after right now is to prove that a PDS can have multiple instances.

I hacked together something rough and this is the result. It's a simple HTTP server that exposes 2 endpoints.

1: /xrpc/com.atproto.sync.subscribeRepos - this is what relays will use to connect to via websocket

2: POST /events - this is what the PDS will use to POST events to

The idea is that as long as one PDS requests a crawl for my PDS domain, when the relays then attempt to make the connection, I'll route the /xrpc/com.atproto.sync.subscribeRepos request to the new service using Caddy. Then when a PDS has an event it wants to send to the relays, instead of using the websocket like normal, it will send the data to the new service via the /events endpoint.

The service will receive the event and then send the event data to each web socket connection it has open. Voila!

This is the repo for that service if you're interested. It's very rough and 100% not production worthy, but it works. It's also a lot of borrowed code from the Cocoon PDS that I'm working with!

Then I needed to make a change to the PDS code so that instead of sending directly to the web sockets, it sends a POST request to the new service. It didn't take much work as I just copy and pasted a lot of what was already there but made a POST request instead of writing to a web socket. Here's the code for that.

The only thing left to do was to wire up my Caddy file on my VPS, get things running and off we go! I ran the new service locally on my laptop and made use of Tailscale to route the traffic from Caddy to my laptop. (I love Tailscale for this sort of thing).

I then made a load of posts on my test account which is hosted on my Pi PDS (2 running instances) and it worked. After each post, I saw it appear in the timeline of not just my test account but also my main account timeline too.

So what I've done is prove that it's possible to have more than 1 instance of a PDS running and have data routed correctly. Now the only downside I'm seeing at the moment is due to latency. It can take quite a few seconds to make a new post. I've not seen much with the reads because I don't do much "reading" of my test account but I'm sure there is extra latency there.

My next task, which I think will take longer, is to come up with a way to make that database connection better. I think Turso has some interesting stuff around local syncing so that I can have a copy of the database running on each machine, so that'll be my first port of call.

This I feel will be super interesting as it'll feed into my original idea of having a PDS possibly running from a phone app. Perhaps that instance on the phone will be the "main" instance and replicate data to any other running instances. That way when creating posts, they will be quickly created on the main PDS since it'll be on the same device, but if my phone dies or I turn it off, other people will still see my content because data will the be fetched from the "secondary" devices which have a copy of the data.

Until next time!