Gwil's garden

Keeping myself busy.

Earthstar × NLnet — again!

September 1, 2022

Last year the Earthstar project was awarded a grant from NLnet foundation.

This grant has made an enormous difference to the Earthstar project, making it possible for me to work on this FLOSS project full-time. It's no exaggeration to say that without their support it's hard to imagine how this project would have continued.

<mark>So I'm pleased to share that NLnet foundation will continue to support Earthstar with another grant.</mark>

This project will start at the end of this year, after I've fulfilled the milestones of the current year's project (more on that below).

Without further ado, here's what's planned for next year's project:

2023 NLnet project

Complete encryption

Earthstar should be an appropriate tool for users handling sensitive data. As things stand, document contents can be encrypted or deleted, but other identifying marks such as the author, path name, and timestamp remain. In the wrong hands, these properties can be used against people.

I'll be adding a new document format which will accommodate totally encrypted documents. Querying and syncing these documents will present a significant challenge. As will potentially repurpose our ed25519 keypairs for encryption. I'll be hitting the books for this one.

Betrayal salvage

Earthstar's shares are designed for users who already know and trust each other. But what happens when that trust is broken?

I'll be adding tools to mitigate the damage caused by these betrayals, such as limiting how betrayer's data is able to sync around a share, and at the most extreme level giving tools for migrating users to a new share while leaving unwanted users and their data behind.

LAN discovery service and sync

Earthstar has deliberately never had any kind of peer / share discovery. The only way shares should be discovered is through the deliberate disclosure from one user to another of a share's address and where share data can be synced with.

But there's one context in which peer discovery makes sense to me, which is when users are connected to the same LAN. So I'll be adding a service for discovering Earthstar peers and then syncing with them over a LAN.

Share discovery will remain impossible even over LAN. The only way two peers on the same LAN will sync with each other is if they already have shares in common.

Sync specification

Efficient and powerful sync is vital to making Earthstar useful. We've experimented with two unspecified and undocumented sync methods already, but we are beginning to land on a protocol which will enable efficient synchronisation between peers, based upon a new set reconciliation method.

With the benefit of some experience of this method in the wild, I'll write up a specification for this sync protocol and some of its more popular transports (e.g. HTTP). Obviously this is useful for people building new implementations of Earthstar, but I think seeing a spec of the set reconciliation stuff will be useful to the distributed systems community at large.

Thanks

Massive thanks to NLnet for approving this project, and for their apparent confidence in the Earthstar project and myself. It means a huge amount.

Right now I'm feeling a mixture of excitement, relief... and slight concern I've bitten off more than I can chew. Which is exactly how I felt last time I was writing one of these.

Speaking of which, I'm still chewing pretty hard on the current project.

How it's going with the current project

The 'large blob support' milestone — in which support for syncing large amounts of arbitrary binary data is added to Earthstar — is reaching the end of its long gestation period. Adding this feature has taken some time as it had a few substantial prerequisites in order to implement it, such as multi-format support. Fortunately these additions will make future work (which will require new formats) more straightforward. I've really been knuckling down on it, which is why there's not been many updates from me lately (there's only so many times you can blog "yeah it's coming along nicely").

Initial syncing tests with large attachments have been promising. Document sync is now orders of magnitude faster, and the speed of attachment transfers seems to only be limited whichever is smaller: the bandwidth of your internet connection or the server you're syncing with.

In addition to that, all of the test replica servers I've deployed so far are free-tier VMs with constrained CPU and memory, and they've been happy to sync large attachments (>500mb) at high speed without a sweat. So I'm confident we will be able to keep the material requirements low for replica servers — lower than it was before.

I'm not sure when these features will release yet. It may make sense to complete the next milestone so as to only put users through a single breaking change rather than two. I'd like to see all these new features put through their paces with real-world applications first too, and hopefully I'll have some news to share regarding that soon.

Until next time!