Earthstar October update

October 31, 2022

A mostly eaten cake with a 2022 sign on it. Crumbs.

There's only a few more slices of the cake that is 2022. But as the saying goes: you can't have your cake and experience linear time.

It's time to start wrapping this year's project up. A quick recap of what's been done so far:

A ground-up rebuild of Earthstar (this was good trust me)
A new, maybe slightly too interactive Earthstar CLI
A new replica server implementation with support for extensions
- One of those extensions being able to serve the contents of a share to browsers
A new core API for syncing a share with a filesystem directory
Support for adding (potentially very big) arbitrary data to shares with attachments.

That last one was one of the 'big fish' of these milestones, and I'm really happy to have it merged (so that I could get to the other big fish).

Protected shares

Earthstar's data model divides universes of data into shares. You could have a share for your family, another for your gaming friends, another for collaborators on your open source project.

Shares are basically a namespace represented by an address. Up to now, a share address would look something like this:

+earthstardevs.b23xue8orl

If you know this address, you can create your own replica keyed to this share and start writing documents to it. And then you can sync that replica with other user's replicas of the same share.

Which basically means a share's address grants write access to that share. Which makes it a very sensitive bit of information.

The random suffix of the share address (e.g. .b23xue8orl) was a way to make a share address harder to guess. But this doesn't help much when the address leaks, e.g. through a screenshot, looking over a shoulder, or other means.

This meant I had to spend time doing funny things like hiding the suffix in UIs. But no longer!

The next version of Earthstar will introduce protected shares. Obtaining a share address will only confer the ability to sync with other replicas and read that share's documents. To write to a share, you will also need a share secret.

The share address' suffix will become a public key:

+earthstardevs.biwgovmepjonvx3kticeuc7rsk3mee6q25q7osl5dqj3fhl2mbawq

to which you need the corresponding secret to write valid documents to the share.

This opens interesting new use cases for Earthstar shares where a trusted group publishes work to the broader public, like blogs or podcasts.

It also makes it easier for shares to be hosted, as replica servers only need the public address to replicate a share.

There will be no option for secretless shares in the future.

Efficient synchronisation

In my May + June update, I was excited about range-based set reconciliation, a new method of identifying differences between two sets and reconciling them, and potentially using that to improve Earthstar's syncing. I talked about it like I was going to get it done over a weekend. Happy days.

These past few weeks I've been working with Aljoscha Meyer to create a new JS implementation of this method, and am about to start integrating this new module with Earthstar. Thanks to Aljoscha's wonderful guidance, I've learned a lot about this method and thought it would be interesting to outline it here!

Imagine you have two sets with only some elements in common.

Two sets with four elements, represented by colours. They have only two elements in common.

How do we determine the difference between the two sets while sending as little information as possible between the two peers?

Enter this method's first trick: generating fingerprints.

An new colour derived from mixing all the colours of the set, representing the fingeprint.

So for example, if the fingerprints for both sets match, then we know that both sets are extremely likely to hold the same elements. And all we had to exchange was fingerprints!

This is pretty much like us generating a hash for the contents of two files and comparing them.

But what if the fingerprints don't match?

Two sets generating their own fingerprints, represented by mixed colours. The final fingeprints of each set do not match.

At this point, we know there's some difference between the two sets. But where?

This is where this method's second trick comes in: we can generate a fingeprint for a specific range within a set.

Using this, we can subdivide the range of a non-matching fingerprint and identify where the difference is:

Each set is now producing two fingeprints, one for each half of it. The first fingeprint of each set does not match, but the second fingerprint does.

We can repeat this approach of comparing fingerprints and drilling down further until we reach a certain threshold where we finally send the items themselves to the other peer.

Two sets each exchanging the two elements they did not have in common.

Of course this is just scratching the surface, and I've obscured and fudged some details for the sake of brevity. There are some ways you can deviate from the above (e.g. how ranges are subdivided), and some critical rules I have not mentioned at all (e.g. both sets must have a common total order). There are also many details to making this work quickly which I'm omitting, as well as many fun properties this method has which can be used to make it even faster!

Most importantly, the implementation I've built has been created as a JS/TS module for general usage. I'll be hooking up this module to Earthstar soon, and releasing it as open source shortly for others to use and peruse afterwards. I'm very excited to see how much time and energy will be saved by adding this exciting method to Earthstar, and maybe outside of it too.

Andrew Chou's app workshops

Finally Andrew Chou has been hosting workshops where he builds little apps using Earthstar for data persistence and transport. In the first two sessions he built a chatroom with display names and online indicators in about 200 lines of code (source here).

And that's without any frameworks! These workshops have also secretly been Web API workshops in which we've been using standard browser APIs to build interactive apps. This has made me especially happy, as I've always tried to design Earthstar to exist as just another API alongside fetch, ReadableStream, or getElementById.

In future workshops we'll be experimenting with Earthstar's new attachment capabilities to build some multimedia apps. If you're interested, they'll be hosted on the Earthstar Discord server.

Until next time