It does indeed turn out to be a difficult and subtle problem. We've tried to balance minimizing novelty (always risky in cryptographic systems) with achieving the various security and scaling properties we're looking for. We're very lucky at Ink & Switch to be working with Brooke Zelenka on this one.
Ideally, you would use existing commodity infrastructure but we have found none of it is really super fit for our purposes. Failing that, we have been developing an approach to low-maintenance reusable infrastructure. For now, I would advise running your own but positioning yourself to take advantage of commodity systems as they emerge.
Mechanical merge algorithms can perform better or worse on different kinds of conflicts (the specific example of editing deleted text is just one of many edge cases) but in the end no CRDT can decide if your merged text is what you mean to say.
We go into a bunch more detail in the Upwelling paper about the differences between (what we call) semantic and syntactic conflicts in writing: https://inkandswitch.com/upwelling/
Ultimately, my feeling is that serious collaboration is a document review problem as much as anything else. That said, this is particularly true in journalism and scientific publishing and can be mostly ignored for your meeting notes...
Anyway, if you see this comment, thanks for a nice piece of writing, Alex. Love to see folks wrestling with these problems.
Hi Peter! Thanks so much for the kind words. I hope you noticed that a lot of the article ends up being a motivation for Ink & Switch's work, which we call out directly at the end. I am a big fan! :)
EDIT: Oh, also I meant to link to Upwelling, but forgot what it was called. I settled for a different link instead because it was deadline.
Robust undo/redo remains an ongoing research project. Leo Stewen's work was presented at PLF 2023 a few days ago. It turns out to be a subtle problem to really get completely right, but in my experience you can usually get passable results by letting editors default undo behaviour reverse text input.
For applications with more document-structured data, you can now produce inverse patches using Automerge.diff to go between any two points. To implement a reasonable undo in this environment you can record whatever document heads you consider useful undo points and then patch between them.
To perhaps expand on why the problem remains unsolved slightly further, there was a robust discussion about what the expected behaviour of "undo" out to be in even simple cases at the conference.
Ink & Switch is behind it; or more expansively mostly Orion Henry, Alex Good, Martin Kleppmann, and myself. As an organization, we have been working on Automerge for about six years now. We also have a wonderful community of other contributors both in industry and research.
Automerge is not VC-backed software. Indeed, for a number of years Automerge was primarily a research project used within the lab. Over the last year, it has matured to production software under the supervision of Alex Good. The improved stability and performance has been a great benefit to both our community and internal users. Our intention is to run the project as sponsored open source for the foreseeable future and thus far we have done so thanks to the support of our sponsors and through some development grants.
Ink & Switch's research interests drive a lot of Automerge development but funding from sponsors allows us to work on features that are not research-oriented or to accelerate work that we'd like to do but that doesn't have current research applications. If you adopt Automerge for a commercial project, I'd encourage you to join the sponsors of Automerge to ensure its long-term viability.
The way I think about it is that if the data should always travel together it should be in one document. For example -- if your TODO list always goes as a unit, then make it an array of objects in a single Automerge document. On the other hand, if you want to build an issue tracker and to be able to link to individual issues or share them individually then a document each is the way to go. Does that help?
As for network transports you can indeed have multiple at once. I usually have a mix of in-browser transports (MessageChannels) and WebSocket connections. I suspect we'll need to do a little adjusting to account for prioritization once people really start to push on this with things like mDNS vs relay server connections but the design should accommodate that just fine.
As for the docs, my apologies. The "tutorial" was merged into the quickstart as part of extensive documentation upgrades over the last few months. We should update the link in the old blog post accordingly.
So if I smoosh everything in my sorta “collaboration context” together into one document, are there any provisions for delta updates on the wire? Your browser-side storage format sounds like it’s compatible with that approach, but what about clients that are far apart version-wise? Are you storing full relay history and also a snapshot?
I see in your format docs [0] that you store change chunks. Are these exposed in the API for atomicity at all? Are there any atomicity guaranties?
And you discuss backends, but I don’t see any pointers to an S3 or Postgres implementation. Is that something you’re keeping closed source for your business model, or am I just missing something?
I haven’t found anything about authorization. Have you done any work there? I quite like the Firebase model in which you can write simple validation rules that can evaluate against the document itself —- “only allow users who are listed in path `members` to write to this document” or whatever.
The sync protocol does indeed calculate the delta between peers and efficiently catches both sides up.
The backends you see are the ones I use, but the API is a binary blob key value store with range queries: supporting other stores should be straightforward.
Authentication isn’t exactly left as an exercise to the reader but is an area of active work. I would say securing access to a URL via whatever mechanism you’re used to should be fine for client server applications and peer to peer folk seem to mostly have their own ideas.
Automerge is a library that anyone can adopt, and we are a research organization, not a product company.
We have built a variety of projects with Automerge, both publicly and for use in private, including recently the markdown-with-comments editor we call Tiny Essay Editor (https://tiny-essay-editor.netlify.app/) by Geoffrey Litt.
That said, sponsoring the Automerge team helps us build faster and is always welcome. (Thanks to our current and past sponsors for their support!)
The benchmarks Matt Weidner has been working on are great and outside scrutiny is always welcome, but I should note that I find there's an element of artificiality to them. In particular, testing the performance of the sync system while simulating many users typing into the same document doesn't really measure behaviour we have observed "in the wild". In our research, we've found that editing is usually serial or asynchronous. (See https://inkandswitch.com/upwelling for further discussion of our collaboration research.)
The benchmark that concerns me (and that I'm pleased with our progress on!) is that you can edit an entire Ink & Switch long-form essay with Automerge and that the end-to-end keypress-to-paint latency using Codemirror is under 10ms (next frame at 100hz).
While these kinds of benchmarks are incredibly appreciated and absolutely drive us to work on optimizing the problems they uncover, we try to work backwards from experienced problems in real usage as our first priority.
> In our research, we've found that editing is usually serial or asynchronous.
Medium-to-large-size company with a town hall = many people editing a document at the same time. Workshop at a company or a university with a modest size classroom = many people editing a document at the same time. I can't tell you how many times our web-based collaborative code editors would fall over during talks with small audiences we would give back in the days when I led the Scala Center.
Just because one of the benchmarks you have seen (of a multitude of benchmarks) breaks automerge by stressing it in what we believe is the most stressful scenario possible– multiple concurrent users, which is sort of the point of concurrency/collaboration frameworks, does not make it artificial or worth so flippantly discarding.
> long-form essay with Automerge and that the end-to-end keypress-to-paint latency using Codemirror is under 10ms (next frame at 100hz)
Not at all what we measured.
I'd just like to register here that Yjs is the framework most widely used "in real usage" (your words) and not automerge (for many reasons, not just performance.)
Please accept my unreserved apologies, Heather! No offense is intended. I can speak for everyone working on Automerge when I say that we've very much appreciated Matthew's work and have indeed spent quite a lot of time studying and responding to it. We spoke about it in person last week, in fact.
As for the use-cases, I do not mean to exclude live collaboration from consideration, just to note that it hasn't been our focus or come up often in the use-cases we study. Live meeting notes are definitely a real use-case and I don't dispute the performance results you show.
As for Y-js, it's a wonderful piece of software with excellent performance and a vibrant community made by exceptional people like Kevin Jahns. We simply have slightly different goals in our work, which undoubtedly reflect where our engineering investments lie.
Indeed, your paper did not measure the same things we look at, and that's why it found new results. Hopefully in time we will join the other systems in performing well on your benchmarks as well.
I have been writing a video game using automerge-repo for networking & save files. I researched Yjs and Automerge and felt that Yjs is better suited to an ongoing session like a conference call, whereas automerge is better suited for network partitions etc. This fit my use-case best.
My opinion might be out-of-date as this area is moving quickly, and there are quite a few options out there now.
> there's an element of artificiality to them. In particular, testing the performance of the sync system while simulating many users typing into the same document doesn't really measure behaviour we have observed "in the wild".
I've seen Matt's work and I think it's quite reasonable to benchmark a concurrent datastructure under concurrent load. Placing systems under high load, even just as a limit study, is how we reveal scalability bottlenecks, optimize them, and avoid pathologies. It's part of good engineering.
If your work can produce more representative workloads from the real world, then they could add to the field's knowledge with new benchmarks.
> testing the performance of the sync system while simulating many users typing into the same document doesn't really measure behaviour we have observed "in the wild"
We use co-editing far more commonly than serial editing.
Coming from a background of XP (extreme programming, pair programming) and a Pivotal Labs style approach to co-thinking, even for executive work we require everyone in a meeting (whether at conference table or remote) to be in the document being shared, and instead of giving feedback, comment or edit in place.
We care a LOT about how laggy this works, how coherent it remains, or whether it blows up and has to be restarted, or worse, reverted.
If a firm culture "whiteboards" by having one person at the board and everyone else surfing HackerNews, they might not be exercising this. If a firm culture is that whiteboards are a shared activity, everyone gathered around holding their own marker, or even just grabbing it from each other, they might need to exercise CRDTs this way.
Put another way, if you "Share" in conf room with an HDMI cable to a TV, or share in a Teams or Zoom by window sharing, you may not be a candidate.
If you "share" by dropping a link to the document in a chat, and see by the cursors and bubbles who is following along, you are a candidate.
. . .
In "Upwelling" you describe an introverted and solitary creative process, before revealing a sufficient quality update to others.
That is certainly a valid use case for unspooling thoughts from one brain, and if those are the wilds you are observing, makes sense why that's what you'd observe in the wild.
It is not, however, the most productive for inventing solutions to logic puzzles with accuracy and correctness in fewer passes, nor for most any other "group" activity. So maybe your "not what we see in the wild" should be qualified by "but we're actually not looking for live collaboration, we're looking for post drafting merge".
That said, now the choice of the term "auto-merge" is much clearer, advertising your use case right on the tin, if one thinks about it.
So thanks for the upwelling link, repeated here for convenience:
Automerge does indeed work with live collaboration, though apparently not currently as efficiently as some other solutions. Everyone working in this space is exploring and looking for solutions that will work for users woth slightly differing priorities. In addition to automerge consider checking out yjs, electricsql, diamond types, replicache, vulcn, or any of the other folks. Hopefully one of them will be just right for you.