Lies I was told about collab editing, Part 1: Algorithms for offline editing

352 points by antics 7 months ago

josephg 7 months ago

Hi! Author of Eg-walker & ShareJS here, both of which are referenced by this post.

You write this article like you're disagreeing with me - but I agree completely with what you've said. I've been saying so on HN for years. (Eg, in this comment from 6 years ago[1].)

The way I think about it, the realtime collaborative tools we use today make a lot of sense when everyone is online & editing together. But when users edit content offline, or in long lived branches, you probably want the option to add conflict markers & do manual review when merging. (Especially for code.)

Luckily, algorithms like egwalker have access to all the information they need to do that. We store character-by-character editing traces from all users. And we store when all changes happened (in causal order, like a git DAG). This is far more information than git has. So it should be very possible to build a CRDT which uses this information to detects & mark conflict ranges when branches are merged. Then we can allow users to manually resolve conflicts.

Algorithmically, this is an interesting problem but it should be quite solvable. Just, for some reason, nobody has worked on this yet. So, thanks for writing this post and bringing more attention to this problem!

If anyone is interested in making a unique and valuable contribution to the field, I'd love to see some work on this. Its an important piece thats missing in the CRDT ecosystem - simply because (as far as I know) nobody has tried to solve it yet. At least not for text editing.

[1] Bottom part of this comment: https://news.ycombinator.com/item?id=19889174

chrismorgan 7 months ago

> So it should be very possible to build a CRDT which uses this information to detects & mark conflict ranges when branches are merged. Then we can allow users to manually resolve conflicts.
So you end up with “conflict” as part of your data model. Seems an amusing way of achieving the C (“conflict-free”) of CRDT, though perfectly legitimate and probably the only way.
The next fun challenge is when you get conflicts over conflict resolution!
- WesolyKubeczek 7 months ago
  
  I am partial to conflict resolution via trial by combat. Very ancient technology, and while it doesn’t always work well, it works every time.
- joakims 7 months ago
  
  As crabmusket said [1], I think the C should stand for Commutative, as it apparently did once. Commutative Replicated Data Types, conflict management optional.
  [1] https://news.ycombinator.com/item?id=42349264
  
  chrismorgan 7 months ago
  
  See my response to that comment now—it’s actually largely incorrect. If you look at it through an operations lens, commutativity is what matters and convergence is meaningless, but if you look through a state lens, there’s no such thing as commutativity, convergence is what matters. So choosing another word that describes the overarching property, that conflicts cannot occur, is quite reasonable.
  
  josephg 7 months ago
  
  Even looking through a state lens, I think the commutativity requirement makes sense to think about.
  For example, one of the simplest state CRDTs is the MAX function. The commutativity rule requires that if we merge some set of states A, B, C, ... in any order, we get the same result. This requires the MAX function to be commutative - ie, MAX(a, b) == MAX(b, a). (And also MAX(a, b, c) == MAX(c, a, b), and so on.)
  This must hold for the CRDT to converge & be correct.
  I thought the C stood for commutativity for the longest time after learning CRDTs. I admitted that to Martin Kleppmann the first time I met him. He thought for a solid minute, then laughed and said "yeah, I think that works too!"
antics 7 months ago

Hi Joseph! I am sorry, I was not trying to say your work sucks. I was trying to (1) help practitioners understand what they can expect, and (2) motivate problems like the one you mention at the end.
(1) might seem stupid but I think just evaluating these systems is a challenging enough technical problem that many teams will struggle with it. I just think they deserve practical advice—I know we would have appreciated it earlier on.
- josephg 7 months ago
  
  No need to apologise or change your article or anything. I think it’s great! It’s true that I haven’t written any articles or blog posts about this problem. People absolutely will appreciate more discussion and awareness of this problem, and I’m delighted you’re getting people talking about it.
  I’m motivated by wanting the problem solved, not getting the most praise by random people on the internet. If today that means being cast in the role of “the old guard who’s missing something important” then so be it. What fun.
  
  lifeisstillgood 7 months ago
  
  I just want to congratulate you both for contributing to the sum of human knowledge and understanding without resorting to entrenched positions, a rarity in todays online discourse. Great to read positive attitudes
CalebJohn 7 months ago

> Algorithmically, this is an interesting problem but it should be quite solvable. Just, for some reason, nobody has worked on this yet. So, thanks for writing this post and bringing more attention to this problem!
I'm skeptical that an algorithmic solution will be possible, but I can see this being handled in a UX layer built on top. For example, a client could detect that there's been a conflict based on the editing traces, and show a conflict resolution dialog that makes a new edit based on the resolution. The tricky part is marking a conflict as resolved. I suspect it could be as simple as adding a field to the crdt, but maybe then it counts as an algorithmic solution?
[1] https://josephg.com/blog/crdts-go-brrr/
- crabmusket 7 months ago
  
  That is what josephg was suggesting:
  > it should be very possible to build a CRDT which uses this information to detects & mark conflict ranges when branches are merged
  
  CalebJohn 7 months ago
  
  I should have been more clear in my original comment.
  I don't think that the conflict detection/resolution needs to live inside the CRDT data structure. Ultimately you might want to bake it in out of convenience, but it should be possible to handle separately (of course the resolution will ultimately need to be written to the CRDT, but this can be a regular edit).
  Keeping the conflict resolution in the application layer allows for CRDT libraries that don't need to be aware of human-in-the-loop conflicts, and can serve a wider range of downstream needs. For example, a note app and a version control system might both be plain text, but conflict resolution needs to be handled completely differently. Another example would be collaborative offline vs. online use cases, as noted above, they are very different use cases.
  
  josephg 7 months ago
  
  I’m not sure I agree that that approach would work. There’s two reasons:
  1. The crdt has an awful lot of information at its disposal while merging branches. I think “branch merging” algorithms should ideally be able to make use of that information.
  2. There’s a potential problem in your approach where two users concurrently merge branches together. In that case, you don’t want the merges themselves to also conflict with one another. What you actually want is for the standard crdt convergence properties to hold for the merge operation itself - and if two people concurrently merge branches together (in any order) then it all behaves correctly.
  For that to happen, I think you need some coordination between the merging and the crdt’s internal data structures. And why not? They’re right there.
  A sketch of a solution would be for the merge operation to perform the normal optimistic merge - but also annotate the document with a set of locations which need human review. If all peers use the same algorithm to figure out where those conflict annotations go, then the merge itself should be idempotent. The conflict annotations can be part of the shared data model.
  Another, maybe simpler approach would be for the crdt library itself to just return the list of conflicting locations out of band when you do a merge operation. (Ie, we don’t change the crdt data structure itself - we just also return a list of conflicting lines as a return value from the merge function). The editor takes the list of conflicting locations and builds a ui around the list so the user can do manual review.
- astrobe_ 7 months ago
  
  Do you think that a LLM/"AI" can reliably solve the merging problem?
  
  taejo 7 months ago
  
  > Do you think that a LLM/"AI" can reliably
  No. LLMs definitely have uses where reliability is not a requirement, but that's one requirement which LLMs clearly never meet
gritzko 7 months ago

The author describes the case of overlapping concurrent splices. It is a known funky corner case, yes. If we speak of editing program code, the rabbit hole is deeper as we ideally expect a valid program as a result of a merge. There was a project at JetBrains trying to solve this problem through AST-based merge. After delving into the rabbit hole much much deeper, the guys decided it is not worth it. This is what I was told.
barrystaes 7 months ago

I would simply argue that the “offline” editing is a people-problem and hence van not be solved using automation. People shall find a way to break/bypass the automation/system.
The only “offline editing” that I allow on human text documents is having people add comments. So not editing, no automated merging.
For “offline editing” that I allow on automation (source code) is GIT which intentionally does not pretend to solve the merge, it just shows revisions. The merge is an action supervised by humans or specialised automation on a “best guess” effort and still needs reviews and testing to verify success.
- josephg 7 months ago
  
  Yes I agree. But remember: but git will automatically merge concurrent changes in most cases - since most concurrent changes aren’t in conflict. You’re arguing you want to see & review the merged output anyway - which I agree with.
  Ideally, I want to be able to replace git with something that is built on CRDTs. When branches have no conflicts, CRDTs already work fine - since you merge, run tests, and push when you’re happy.
  But right now CRDTs are too optimistic. If two branches edit the same line, we do a best-effort merge and move on. Instead in this case, the algorithm should explicitly mark the region as “in conflict” and needing human review, just like git does. Then humans can manually review the marked ranges (or, as others have suggested, ask an llm to do so or something.). And once they’re happy with the result, clear the conflicting range markers and push.
  
  hinkley 7 months ago
  
  The tricky thing about “most” is that it means more than half, but people tend to treat it like almost all.
  I would agree that git works more than half the time.
  Merge resolution is a problem so hard that even otherwise capable developers fuck it up regularly. And on a large team made of people who intermittently fuck up, we start getting into and aggregate failure rate that feels like a lot.
  The whole idea with CRDTs was to make something humans couldn’t fuck up, but that seems unlikely to happen. There’s some undiscovered Gödel out there who needs to tell us why.
  
  joakims 7 months ago
  
  I think it's just a UX problem. Once that is solved, which I believe is definitely possible, both CRDTs and git can be made much more user friendly. I'm not saying it's easy, because it hasn't been solved yet, but I don't think the right people have been working on it. UX is the domain of designers, not engineers.
  
  hinkley 7 months ago
  
  I think we are sitting at about 75% on one of those problems that will go asymptotic at 90%.
  And that’s if you swap the default diff algorithm for one of the optional ones. I’ve used patience for years but someone tweaked it and called it “histogram” when I wasn’t looking.
  
  SkiFire13 7 months ago
  
  > Ideally, I want to be able to replace git with something that is built on CRDTs. When branches have no conflicts, CRDTs already work fine - since you merge, run tests, and push when you’re happy.
  How is this different than git's automatic merging? Or another compatible algorithm
  
  josephg 7 months ago
  
  In the happy case? Its no different. But in the case where two offline edits happen at the same location, Git's merging knows to flag the conflict for human review. Thats the part that needs to be fixed.
  
  SkiFire13 7 months ago
  
  It's not clear to me what you want to replace from git then, it seems to me that you rather want to take concepts from git.
  
  josephg 7 months ago
  
  I want a tool with the best parts of both git and a crdt:
  - From CRDTs I want support for realtime collaborative editing. This could even include non-code content (like databases). And some developer tools could use the stream of code changes to implement hot-module reloading and live typechecking.
  - From git I want .. well, everything git does. Branches. Pull requests. Conflict detection when branches merge. Issue tracking. (This isn't part of git, but it would be easy to add using json-like CRDTs!)
  We can do almost all of that today. The one part that's seriously missing is merge conflicts.
satvikpendem 7 months ago

I mentioned Loro in another comment which uses EG Walker, do you think they are an example of what you had mentioned? This comment also seems relevant [0].
Regarding your [1], I had a similar idea and I am beginning to think that only something like an LLM or similar can truly merge conflict free because only they are powerful enough to understand users' intent.
[0] https://news.ycombinator.com/item?id=42343953#42344880
- josephg 7 months ago
  
  Does Loro generate conflict ranges when merging branches, thus allowing manual conflict resolution?
  I’ve heard people suggest using LLMs for years for this - but CRDTs work because the same computation on two peers is guaranteed to produce the same result. LLMs can’t guarantee that. You could probably use an llm in leu of a human manually merging conflicts - but in that case we still need the crdt to first generate those conflict ranges to pass to the llm. Essentially an llm could solve the UX problem, but the underlying algorithm still needs this feature first to allow that to be used.
  
  mkl 7 months ago
  
  It seems that they might, but perhaps not in a good way yet, and they have it as a goal.
  From https://loro.dev/blog/v1.0#next-steps-for-loro:
  > When merging extensive concurrent edits, CRDTs can automatically merge changes, but the result may not always meet expectations. Fortunately, Loro stores the complete editing history. This allows us to offer Git-like manual conflict resolution at the application layer when needed.
  From https://loro.dev/docs/tutorial/time_travel:
  > once we have a more refined version control API in place.
  I'm interested to hear your opinion of their partial adoption of Eg-walker: https://loro.dev/blog/v1.0#leveraging-the-potential-of-the-e...
  
  dmurray 7 months ago
  
  > but CRDTs work because the same computation on two peers is guaranteed to produce the same result. LLMs can’t guarantee that.
  This doesn't seem like a hard requirement. It's a nice property for some academically interesting peer-to-peer trustless algorithm, but in typical software architectures you won't get a situation where Alice receives Bob's edits and Bob receives Alice's edits without them passing through a centralised server that can reconcile them first.
  In any case, LLMs can be run deterministically (temperature=0, or use a fixed random seed and a single core where necessary).
  I don't expect further serious work in the offline text editing space. The next generation will certainly be some form of "guess, then ask an LLM to guess"
  
  josephg 7 months ago
  
  > in typical software architectures you won't get a situation where Alice receives Bob's edits and Bob receives Alice's edits without them passing through a centralised server that can reconcile them first.
  Thats the exact situation these algorithms are all designed to handle. If you require a centralised reconciling server, you’re taking a big step back in functionality - and you no longer have an eventually consistent system.
  But as I say, asking an llm - what exactly? Most of the time concurrent edits are in different regions of a document and you want them to be merged automatically in the obvious way. It’s only when the same text (same line) is edited by two peers that you want to invoke a llm (if at all). If that’s the case you still need the crdt to detect conflicting ranges - you’re just using an llm instead of a human to resolve them. And in that case, CRDTs are still missing the important feature this blog post talked about. You’re just proposing a different UX layer on top of that missing feature.
  And really, you’ll probably still want human review. LLMs have a tendency to do drive-by edits in situations like this, adding or removing commas in prose, changing formatting in code and so on. I don’t trust LLMs to touch my code or my writing without human review.
  
  satvikpendem 7 months ago
  
  Loro is planning on doing so, as the other comment says. Related, is this what you are looking for [0]? The commenter says that their implementation will surface conflicts to the user.
  [0] https://news.ycombinator.com/item?id=42348381
ashoeafoot 7 months ago

What i would love is, if the edit conflict was already baked into the language . Conflict<Implementation1, Implementation2>.
- GTP 7 months ago
  
  What would be the advantage vs having an UI that clearly shows the conflict?
  
  ashoeafoot 7 months ago
  
  It would allow for identification of intent and reasoning over that.
  Is it aggregate, the people adding to the same with no overlaping elements.
  Is it a overlapping set, n-people contributing the same
  Is it a xor set where two people want to split up a implementation and go into different directions ?
  If one deletes and creates a new as replacement , the other continues to work on the element, that could auto map to an intent above.
  And if you know that you can provide much more resonable merges.
kjrfghslkdjfl 7 months ago

[dead]

pvh 7 months ago

Mechanical merge algorithms can perform better or worse on different kinds of conflicts (the specific example of editing deleted text is just one of many edge cases) but in the end no CRDT can decide if your merged text is what you mean to say.

We go into a bunch more detail in the Upwelling paper about the differences between (what we call) semantic and syntactic conflicts in writing: https://inkandswitch.com/upwelling/

Ultimately, my feeling is that serious collaboration is a document review problem as much as anything else. That said, this is particularly true in journalism and scientific publishing and can be mostly ignored for your meeting notes...

Anyway, if you see this comment, thanks for a nice piece of writing, Alex. Love to see folks wrestling with these problems.

antics 7 months ago

Hi Peter! Thanks so much for the kind words. I hope you noticed that a lot of the article ends up being a motivation for Ink & Switch's work, which we call out directly at the end. I am a big fan! :)
EDIT: Oh, also I meant to link to Upwelling, but forgot what it was called. I settled for a different link instead because it was deadline.

sgarland 7 months ago

The other dark side of implementations using CRDTs is the infrastructure load. I wrote about this [0] in depth previously, and Supabase wrote an article [1] a couple of years ago about a CRDT extension for Postgres which I'm happy to discover agrees with my empirical findings.

If you're going to use CRDTs, do yourself a favor and either use Redis or similar (though the amount of memory being consumed is painful to think about), or MyRocks [2] (or anything else based on RocksDB / LevelDB). Whatever you do, do _not_ back it with an RDBMS, and especially not Postgres.

[0]: https://news.ycombinator.com/item?id=40834759

[1]: https://supabase.com/blog/postgres-crdt

[2]: http://myrocks.io

michaelsalim 7 months ago

Thanks for this. Seems really useful since I'm building using Yjs + Postgres myself. Might save the day one of these days!
- satvikpendem 7 months ago
  
  PowerSync has an article on using Postgres with Yjs (and perhaps look into Yrs, the Rust implementation, as well as other Rust crates like Loro and Automerge that are much faster) and they use a table in the database that stores the changes, is that what you are doing too [0]?
  [0] https://www.powersync.com/blog/postgres-and-yjs-crdt-collabo...

MazeChaZer 7 months ago

The observation of this article is spot on! CRDTs are an awesome formal model for distributed data structures, but I was always bothered by the notion that all conflicts must be resolved automatically (hence also the name, conflict-free replicated data type). As the article illustrated, this is a hopeless endeavor. I believe what is needed is a proper structured representation of conflicts, that allows for sharing them and resolving them collaboratively, giving back control to the users and supporting them in the process. One of my favorite papers “Turning Conflicts into Collaboration” [1] makes a compelling argument for this idea.

As part of my ongoing PhD studies, we have developed our own formal model for structured conflict representation, based on lattice theory: “Lazy Merging: From a Potential of Universes to a Universe of Potentials” [2]. Incidentally, it is also a CRDT, but it does not attempt to resolve conflicts automatically. Instead, it represents them within the collaborative documents. Approaching the problem from mathematics allowed us to arrive at a simple conceptual model that can guarantee strong properties, like, e.g., the completeness, minimality, and uniqueness of merges, even after repeated merges of existing conflicts. And merges can be calculates very easily. I always wanted to make a blog post about it, but I never came around to do it.

[1] https://doi.org/10.1007/s10606-012-9172-4

[2] https://doi.org/10.14279/tuj.eceasst.82.1226

crabmusket 7 months ago

Thanks for those citations, they look really interesting!
> hence also the name, conflict-free replicated data type
When I was introduced to CRDTs the acronym stood for commutative replicated data types (eg in the paper by Shapiro et al). I prefer this actually, despite it being harder to pronounce.
A conflict is a complicated idea, and while "conflict free" is a technically correct way of describing the result, it can be misleading as evidenced by this post and your comment.
Commutativity is the property that when Bob applies changes in the order [Bob, Alice] and Alice applies changes in the order [Alice, Bob] that they both end up with the same document. It doesn't imply that the document is somehow "free" of "conflicts" in a sense that may be meaningful at a higher level of abstraction.
- chrismorgan 7 months ago
  
  > the acronym stood for commutative replicated data types (eg in the paper by Shapiro et al).
  https://pages.lip6.fr/Marc.Shapiro/papers/RR-7687.pdf: the paper clearly calls it Conflict-free Replicated Data Types, though it does also define the two styles:
  • State-based Convergent Replicated Data Type (CvRDT)
  • Op-based Commutative Replicated Data Type (CmRDT)
  
  crabmusket 7 months ago
  
  I guess I was thinking of this one? But I haven't gone back to confirm the content https://inria.hal.science/inria-00555588/document/

Rygian 7 months ago

I would wager that, in general, supporting the notion that several different entities are all the authority over a piece of data simultaneously and without live coordination is not solvable. This is a learned lesson for distributed systems, and is readily apparent in the article when considering distributed editing of documents. Same goes for dual input in flight cabins, parenting, and probably any other disparate example one can think of.

beefnugs 7 months ago

It is solvable, but needs more complicated contextual information that many people would not want to bother entering : "this word i just changed only makes sense if it is apart of this whole sentence, which is not necessarily required for the whole paragraph..."
And calling this "solvable" is a funny thing to think about, since huge portions of the earth think the chaos output of LLMs could be anywhere near deciding the final output of computation at this point in time
- Rygian 7 months ago
  
  "It is solvable" is equivalent to "politics are not necessary", which I can't agree with.
  Agreeing on edits requires a shared context, a shared understanding of the goal and requirements of the text being edited, and a shared projection of how readers should understand the text and how they will understand it instead.
  The specific contextual information required for automated editing is dependent on the combined situation of all writers, considering their professional, personal and cultural contexts.
  Assuming that context can't be made available in a systematized way, the machine will choose an edit that is not guaranteed to match the intentions and expectations of the people involved. Instead, it will just act as adding one more writer to the mix.
- theamk 7 months ago
  
  heh, I bet no matter what kind of textual explanation you required, I can provide the situation it does not cover.
  You say this word is only required if it's a part of this whole sentence? OK, the other edit kept the whole sentence, but changed a single other word in it, which happened to be the subject.
  
  eastern 7 months ago
  
  100%.
  The given situation is solvable only by the humans involved. They want different things. Either one of them has authority over the other, or they talk it over.
- hinkley 7 months ago
  
  Idiomatic speech and allusion being two important cases. Turning a line into a literary or pop culture reference require pretty close adherence to the exact phrasing. And two edits are unlikely to achieve that.
fsckboy 7 months ago

Brewer's CAP theorem of distributed data storage. you can have two out of these three:
1. Consistency
2. Availability
3. Partition tolerance

mweidner 7 months ago

One challenge is that the algorithms typically used for collaborative text editing (CRDTs and OT) have strict algebraic requirements for what the edit operations do & how they interact. So even if your server is smart enough to process the "Colour" example in a UX-reasonable way, it's very hard to design a corresponding CRDT/OT, for optimistic client-side edits.

You can work around this by not using CRDTs/OT. E.g., the server processes operations in receipt order, applying whatever UX logic it wants; clients use a rebase/prediction strategy to still allow optimistic edits on top (cf. https://doc.replicache.dev/concepts/how-it-works). Doing this for text editing has some challenges, but they're separate from the CRDT/OT challenges discussed here.

antics 7 months ago

Author here! Just chiming in to say this is a very underrated comment which I fully cosign. :)

zamalek 7 months ago

I think this happens because the mathematical, or causal, or entropic, notion of conflicts has been conflated with semantic conflicts. In the past I have made the same mistake, though inversely and was adamantly informed that I had no clue what I was talking about :)

Things get way nastier when you start considering trees, e.g. yJS operates on JSON documents. From a UI standpoint (where the UI is showing some shallow level and hasn't been expanded to the deeper level) users could never even see edits that have been deleted.

I think that the class of CRDTs that preserve conflicts (IIRC that is when a register can hold multiple values) hold the most promise. Users should then be presented with those conflicts - and it could even be completely visual. Being able to scrub through history also seems like a viable alternative (allowing the user to figure out how a strange thing happened, or how their changes disappeared).

Onavo 7 months ago

> Users should then be presented with those conflicts - and it could even be completely visual. Being able to scrub through history also seems like a viable alternative
I think "Git" would be a wonderful name for this type of CRDT.
- Etheryte 7 months ago
  
  For all things Git is good at, conflict resolution is most definitely not one of them. There are many ways to slice a conflicting diff into plus and minus parts, and somehow Git usually manages to create conflicts that are the least human readable. We live in an era of widely adopted and universal language servers, yet Git pretends that every change is just a char added or removed. There are many tools out there which do a considerably better job at this, for example by diffing the AST, not the text representation.
  
  Wowfunhappy 7 months ago
  
  Hi, I'm a hobbyist coder with lots of personal forks of open source projects. I frequently spend a lot of time merging in changes from upstream. Can you suggest tools that would help with this?
  
  CGamesPlay 7 months ago
  
  https://mergiraf.org/
- satvikpendem 7 months ago
  
  I mentioned Loro in another comment, but they actually do conflict resolution on top of Git trees [0]. Jujutsu is also interesting but I'm not sure if they do any conflict resolution [1].
  [0] https://loro.dev/blog/v1.0#loro-version-controller
  [1] https://github.com/martinvonz/jj
- drdaeman 7 months ago
  
  More like Pijul or Darcs?
  Git is popular, but it's not particularly great at conflict resolution.
- zamalek 7 months ago
  
  The advantage that CRDTs have is they have a complete replay. In theory you could do some clever stuff with that.
joakims 7 months ago

I think Loro is promising in that regard. They're actively working on this problem.

leoc 7 months ago

> In 2009, a surprising amount of the discourse was focused on the algorithms git used to automatically merge changes together.

IIRC Torvalds himself was quite pessimistic about what could be achieved with automatic merging. (And in this he was correct.) He said that Git had rejected the idea that a version-control system could, or should attempt to, "solve the merging problem" by finding a sufficiently-smart algorithm which would do the right things automatically.

> Offline editing is a UI/UX problem

True. There are two deeper root causes here:

1) computing's commitment to cargo-culting old solutions and, relatedly,

2) its devotion to the belief "in general 5-lb sacks are nicer to deal with than 10-lb sacks: therefore I should fit my 10 lbs of POL into one 5-lb sack".

The default vision of "text editor" is "Mosaic textarea", "MacWrite" or something in between, so the quest is usually to bolt merging onto something like that with the minimum possible change. Make it a menu item, or a little dialog box with a few options. If there is some kind of GUI support for merging hidden deep in the menus it's a programmer-UI diff-merger horror that barely does the minimum, or a strikethrough-based view which feels treacherous like navigating a ship through a fog. But in fact in text editing with offline collaboration merging, partly-manual merging at that, is a central part of the process and it needs to be central to the design of the editor. Unfortunately MacWrite is a local maximum which isn't easy to get far away from.

theamk 7 months ago

what are the alternatives, though?
For example, often when someone mentions "cargo-culting" and "old solutions", their next words are "stop editing code as text, edit as a syntax tree". But it has the same problem, just replace "character" with "statement"! Bob added a line to "else" branch of "if" statement, Alice deleted the entire statement, along with "else" branch - what is the smart system to do?
- swiftcoder 7 months ago
  
  I don't think the syntax tree approach is a silver bullet, but it does vastly reduce the kind of annoying things that trip up git merges today. Issues like whitespace conflicts, formatting clashes, reordering of import statements...
  Though we seem to have collectively decided it's easier to solve this by forcing everyone to adopt the same auto formatter configs, which, I guess fair enough
antics 7 months ago

[Author here] I am sorry, I think I phrased the automatic merging point confusingly. I was trying to say that when multiple commits change a file, git will attempt to merge them together, but it MUST flag direct conflicts. Sounds like we agree this is the right approach overall though.

antics 7 months ago

Hi folks! Author here. Happy to answer questions or take feedback. I'll be in meetings for an hour or two but I love talking about this stuff. :) Here or over email if you prefer, alex@moment.dev

jakevoytko 7 months ago

If you keep the offline support, you'll eventually uncover even more fun cases. "I started working on this on an airplane where the wifi was down. But then decided I didn't like the direction it was going and just closed the laptop and took a nap. I spent the next few days working on the document on my desktop. Over the weekend I opened the doc on my laptop and now all of my airplane changes are in the doc and everything is garbled. Help, I didn't mean to merge them!"
Git would never automatically mash your local changes in without your explicit consent. bzr would never have dreamed of it. But things like Google Docs will happily do it.
It's awesome to see all the progress y'all have made! Good luck with early access!
- antics 7 months ago
  
  Hi Jake! Long time no chat! I hope everything is going well. Yep, those are exactly the kinds of cases we seek to give people more control over. It's like you read my mind. :)
mike_hearn 7 months ago

What's the problem with just adopting patch/diff style merging? I mean, offline collaborative text editing is a solved problem for decades if you're going to phrase it as just a UX optimization problem.
- antics 7 months ago
  
  I'm not sure I understand the question fully, but yes, I am a big fan of the diff/review/patch-style tools like git for offline editing, and that is exactly why I call them out in the final paragraphs. The reason I don't agree that it's completely closed-book solved is because I think the git diff UI is not appropriate for most people whoa re not developers.
  
  mike_hearn 7 months ago
  
  Not the exact git GUI, but bear in mind Word has supported offline edit/merge for decades and it's done in a similar way (you see diffs visually and can accept/reject them). Lawyers depend on that workflow heavily.

keizo 7 months ago

I implemented differential sync (https://neil.fraser.name/writing/sync/) mostly because I couldn’t understand anything else and seemed simplest in my grugnotes.com app -- and while the app is pretty janky and not fully real-time, it does get your example merge right regardless of who comes back online first. In the case the deletion comes online first, the 'colour' version is thrown out and not saved in edit history. I’m sure there’s a lot more wrong with it, have no idea what would happen with more than two users, but for my case I’m happy with it. :)

satvikpendem 7 months ago

I had an interest in CRDTs for quite a while now, as I like the local first philosophy of developing software, that works offline in its entirely but can also work online, a sort of progressive enhancement [0]. Recently I've been looking into Loro [1] which seems like it is able to effectively merge disparate offline text edits together, by using some new algorithms that were written about last year, such as the Event Graph Walker [2]. I've been combining this with ElectricSQL [3], which is a sync engine for Postgres. In the past they had their own CRDT right inside Postgres which would then sync tables, but they have rewritten their product to focus primarily on being a sync engine first and perhaps a conflict resolution library second. Therefore, I store the Loro changes binary in a table in Postgres as a blob that I then sync via Electric to all my clients.

Ultimately though, it is as you and others like @zamalek in this thread have said, the mathematical notion of conflict resolution might not actually mean anything semantically, therefore it is difficult to have long running offline edits merge together cohesively; it works with things like Google Docs where the user can see what other users have written in real time, which works for 99% of use cases, and sometimes I wonder whether one really needs such long running offline syncs, as it seems to be a very niche goal to target. Short running offline is nice to have and even necessary, especially for products one expects to work wholly offline, but it is the long term I don't see much value in, as in not collaborating online for weeks or months at a time but still expecting cohesive edit merges.

[0] https://localfirstweb.dev/

[1] https://loro.dev/

[2] https://loro.dev/docs/advanced/event_graph_walker

[3] https://electric-sql.com/

refulgentis 7 months ago

Alas, Loro fails The Color of Pomegranates test as well. (JSON trace, really cool toy they got there: https://pastebin.com/6dSDc6Su)
- satvikpendem 7 months ago
  
  Yes, it is as I mentioned in my 2nd paragraph, mathematical conflict resolution is not semantically relevant for humans much of the time. That is why I don't think automated merge conflict resolution with no human input can really work, that is why Git asks you to resolve merge conflicts before committing again. CRDTs can only really help when users edit disparate pieces of data, and if editing the same piece of data, some set of users need to be there, as in an online rather than offline capacity, to facilitate the merges, such as in Google Docs where if I edit a sentence with a spelling correction and you delete the sentence entirely, I will ask you directly what's up.
  Now, what some Git merge software is doing is using LLMs to coordinate merges, which I think might be useful to do so in plain text editing too, perhaps embedded into these CRDT libraries, but fundamentally, there must be someone to facilitate merges, whether it be a human agent or an AI one. It is impossible to do so with an entirely automated solution simply because the machine does not know the intents of all its users.
- williamstein 7 months ago
  
  The text sync algorithm I wrote succeeds at the "The Color of Pomegranates" test. See https://cocalc.com/wstein/dev/share/files/2024-12-06.md for some details of the exact patches. This algorithm, which we came up with in 2015, is described at https://blog.cocalc.com/2018/10/11/collaborative-editing.htm... It's a different approach than that taken by Yjs, Automerge, and Egwalker, with signficant pros (and cons!) relative to those algorithms. It has been used in production in CoCalc for over 8 years, and was also used by the recently shutdown Noteable Jupyter notebook project.
  
  refulgentis 7 months ago
  
  This is absolutely fascinating and immediately one of my favorite articles of all time, full stop. Along the way, it implicitly illustrates a path to successful engineering x product thinking path that I've never seen written down somewhere and I think is crucial for sustainable impact. Thank you!

major4x 7 months ago

I love this mini article, however, I disagree with the main conclusion that collaborative editing is not an algorithmic but a UI/UX problem. I think that collaborative editing is a semantic problem. To the best of my knowledge (I'm writing this comment without much preparation), all SVN/Git algorithms are based on UNIX diff (Hunt–Szymanski algorithm). UNIX diff (and patch) is purely syntax driven.

Actually, I will make a small deviation here: I think it is a big industry/startup/open source project there, in creating a set of semantic diff algorithms/implementations. For example, due to my present job, I am very interested in collaborative editing of electrical circuits, and layouts for PCB and chips. Altium and KiCad are trying, for exmaple, to store everything in XML/text files and put the text files in Git/SVN and I can tell you a botched C++ program is nothing in comparison to a botched and malformed electrical circuit. So we need diff tools that "know" about a text file, vs rich-text with formatting, vs bitmap vs vector image, vs song, vs English text. Anybody want to start an open source project (DM me or put a comment here).

Anyhow, thanks to the authors on the great insights and let's work on the take home!

gavinhoward 7 months ago

Hey, I am trying to make a version control system with semantic diff. Electrical circuits and layouts are high on my list.
My profile has links to webpages with my contact info.

miltonlost 7 months ago

AI trying to figure out user intention and incorrectly so due to their heuristics or inherent missing context in merges, is why autocorrect has become almost more trouble than it's worth these days. Texts on phones are impossible to start with "well" or "we'll" as it will replace with the other before the second word. Algorithms need to stop saying that what they find more likely must be true; "more" likely is less useful when there's a 51/49% chance or a 4/3/3/3/3/3/3/3/3/3...3% chance. Sometimes the AI will be right; sometimes it will be wrong. I'm far more upset when its wrong than the few times it's right. The same problems with offline editing will raise their head as AI forces its way into our lives attempting to make decisions based on not nearly enough info.

Basic symbol or character-by-character manipulation cannot reveal why the changes occurred, just what happened.

theamk 7 months ago

Turning off automatic autocorrect is one of the first thing I do when i get a new phone. Sure, show me suggestions, but let me decide myself if I want them or not!

taeric 7 months ago

My rephrasing of the problem on coordination-free algorithms is that collaboration is largely a subset of correspondence. For some reason, so many of these discussions try to treat it as a completely separate thing. This invariably results in longer feedback loops for those corresponding with no direct communication of what, specifically, they have done.

dietr1ch 7 months ago

I'm angry that I guessed right. I really don't like how flat the model for text generally is. I don't see text as a [Char], so if the computer does there's a hidden mismatch that people have grown to work around and with (just like searching on the Internet before 2005 felt. You had to know what was gonna work and then got the right to claim how easy it was)

We structure text in an implicit hierarchy given by spacing, margins and a bunch of other subtle things that [Char] doesn't capture, but can encode for other humans. I think that this is were all the problems stem from, and that with right (tm) structure a lot of the operations can be merged trivially with way fewer surprises. Easier said that done for sure, and there will be more weird cases, but I'm guessing they will get closer to being just the conflicts that really need authors to deal with.

joakims 7 months ago

A gramatically aware text CRDT? At least aware of words and sentences? I'd be curious to hear whether that's been tried, and if it solves any issues or produces new ones.

refulgentis 7 months ago

This is so, so, sooo good. I've never been brave enough to say it out loud, but this is 110% my experience.

I imagine it is somewhat a consequence of the divide between engineering and product.

It can resolve all conflicts, and it's such a holy grail to go decentralized / get Google Docs-like editing down to a a package in your favorite language. But, in practice, its intractable for arbitrary data, even just an arbitrary string.

I do wish there was a formal proof that showed / proved this to share in HN discussions re: CRDTs...but hey, this is great! The "it left a u" example is simple, intuitive, and with a charitable listener, I doubt they'd argue we can't figure out a string unambiguously, but we can figure out JSON unmabiguously.

antics 7 months ago

Author here, thanks for the kind words! I think one reason we ended up here is that it is a genuinely difficult technical problem even to analyze the solutions. One of my hopes for this series of posts is that it makes the evaluation process more straightforward, particularly for people who do not have a strong background in distributed systems algorithms.

robertclaus 7 months ago

The most surprising part of this article for someone uninitiated like myself is probably that products/algorithms are claiming this automatic reconciliation is consistently possible. Maybe I've spent too much time resolving code merge conflicts by hand, but this seems intuitively obvious to me...

Feels like https://xkcd.com/1831

williamstein 7 months ago

What they claim is that if all editing stops, then after a period of time everybody will be looking at the SAME document. This is what is meant by "eventual consistency". Achieving this in general is indeed a difficult problem, but (some of) these algorithms do solve it, though it can be tricky to correctly prove that they do. I agree that it is not possible to ensure that the document everybody is looking at is what they actually wanted it to be. However, there are some options for what happens, where some results may be technically correct -- we are all looking at the same thing -- but obviously really bad. This beautiful talk has some examples: https://www.youtube.com/watch?v=x7drE24geUw
- robertclaus 7 months ago
  
  Isn't eventual consistency theoretically trivial...? You just delete the whole document after every transaction. I think it's safe to assume anyone talking about this _means_ the results remain meaningful - which for text/meaning is subjective.
  
  williamstein 7 months ago
  
  Proving that there exists an algorithm that results in an eventually consistent view of the document is trivial. However, that's not what we're talking about. Instead, researchers define a specific algorithm (e.g., involving CRDT's or OT's or something else), then prove that their algorithm results in an eventually consistent state. This reminds me a little of the relationship between proving that there exists an algorithm to factor all positive integers (this is trivial) and proving there exists a subexponential time algorithm to factor integers, which is much less trivial (see https://en.wikipedia.org/wiki/Lenstra_elliptic-curve_factori...).

yaweezy 7 months ago

Really enjoyed reading the post and the comments. And it’s good to see a general consensus on this point. I had a short stint at a legaltech startup and began putting together a framework for resolving drafting edits/conflicts. In the legal context, it was more critical that the user actively signed off on a change to avoid random errors, so my thought was to implement a UI/UX solution to review each change similar to Microsoft Word’s spelling and grammar check. There were some other options I proposed, like showing multiple ways that the change could be implemented. The team decided to focus on solving easier problems and the startup folded not too long after that. But it was a really fun exercise. Glad to see folks are still thinking deeply about it.

klntsky 7 months ago

Looks like a good use case for LLMs!

https://chatgpt.com/share/67538beb-73e8-800a-b602-fe26b131e5...

HMU if interested in building it

antics 7 months ago

Author here! We have experimented with building this, see my comment elsewhere here: https://news.ycombinator.com/item?id=42348651
One of the surprising things is that LLMs regularly "fix" things that no other system can fix. Like if we both add the same sentence to a doc. It's interesting stuff.
With that said I am not sure that this specific LLM is providing the "right" answer. It seems like AN answer! But I think the real solution might be to ask the user what to do.
sausagefeet 7 months ago

Take what you wrote and flip it and you get a different answer:
https://chatgpt.com/share/67542c95-fea8-8008-8749-7b7daf355c...
Here is the resulting semantically meaningful diff that reconciles the edits:
- The color of orange is orange
+ The colour of orange is orange
This reflects the change in spelling from "color" to "colour," which is the only meaningful difference between the two edits.
mkl 7 months ago

Hardly. That just threw away Edit1, failing to follow instructions even in this very simple case.
tags2k 7 months ago

o1 gets the original article question correct: https://chatgpt.com/share/675410c1-22c4-8001-b36a-24425127cc...

Gehinnn 7 months ago

> I think the primary change is that these debates were focused on producing diffs that humans read; now the debate is whether the algorithms can accomplish this result with no human involvement at all.

The better the diff captures the intent of the user change, the easier the diff is to read, AND it also becomes easier to understand merge conflicts, as the conflict becomes a conflict of intents and not characters. Sometimes certain conflicts can even be avoided if the diffs are chosen right (or are expressive enough, e.g. when code moves can be detected).

eps 7 months ago

OP, your site's scrolling is completely broken on an older iPad. Just can't scroll the page at all. It shows only first screenful. Pulling it up just shows whitespace below and bounces back. Reader mode shows the whole page, but then all formatting is gone, so it's also unreadable. Just FYI.

breadwinner 7 months ago

Has anyone tried Microsoft's fluid framework for collaborative editing?

https://github.com/microsoft/FluidFramework

https://fluidframework.com/

ramon156 7 months ago

Just use git (: (Half joking)

antics 7 months ago

Author here! I think you're going to like one of the upcoming posts in this series. :)

Confirm2754 7 months ago

Thank you for your research! I've been working on integrating CRDT into note-taking software recently. Thank you for letting me know what's more important!

jFriedensreich 7 months ago

Totally agree with this article, and this is about collaborative text editing, which is something these solutions and algorithms are actually good at. Looking at everything from crdt fanboys applying this to application data and ignoring the UI/UX aspects about conflict resolution makes my neck hair stand up. This is how we end up with linear giving talks about their "sync engine" and happily deleting other users changes with last write wins.

joakims 7 months ago

I don't think they're ignoring it, they've just been focusing on solving the algorithmic problems before diving into UX problems. Loro, for instance, has recently shifted their focus towards conflict merging following their 1.0 release. It's simply a question of building a solid foundation before making it user friendly.
Yours sincerely,
CRDT fanboy
- jFriedensreich 6 months ago
  
  (Automatic) Conflict merging as crdts usually see it is not the same as a full meaning preserving conflict resolution. There is no way an algorithm can always solve a merge conflict without user input or at least custom application level resolution logic. You cannot add this as an afterthought to some automatic crdt. If you search loros docs there is no result for an api to tell it how to resolve a conflict and this is true for most of these systems. Automerge has something but its not fully exposed and quite usable afaik. Again this is fine for real time text collaboration but not for usage for application data.
  The simple litmustest to understand what i mean is this:
  You rename a todo changing its meaning and concurrently mark it as done. Can you tell the sync engine to always disregard the marking as done in case of changing the text. If there is no api for this to fully control the merge semantics, your system is not suitable for application data.

aetherspawn 7 months ago

Seems like a great application for AI. An LLM could most likely predict how things could be merged with high accuracy in 3 steps:

1. Look at the authors changeset to put a “reason”, or similar, to the edit

2. Ask whether either of the edits makes the other edit redundant

3. Ask the LLM which order to replay the changes (addition or deletion) to preserve the reason of both edits, or otherwise pick the best order

I doubt you need even a 70B model to get “90% good results”, which is all you probably need, because lets be honest offline editing is quite an edge-case to begin with in 2025.

antirez 7 months ago

Now that there are LLMs available, why is this still a problem? You just have to detect the conflicts and show a powerful enough LLM the two versions and tell it to do its best job at merging. This solves exactly the kind of issue described in the post. Oh, bonus point: you don't have to take any metadata.

miltonlost 7 months ago

> You just have to detect the conflicts and show a powerful enough LLM the two versions and tell it to do its best job at merging. This solves exactly the kind of issue described in the post.
How does "LLM tries its best" solve the problem of exactly syncing documents offline? "Tries its best" implies it could fail which is what the problem already is. An LLM only adds a new layer of abstraction, but now the downside is it's impossible to analyze.
Maybe I just have a much higher threshold of "solved" than you do, but anything non-deterministic (without some human judgment a la git) is not a good solve for document syncing.
A "powerful enough LLM" is the equivalent of "and a wizard does it".
- antirez 7 months ago
  
  LLMs at t=0 are predictable (will produce the same output starting from the same input) and do a much better merge work than any other non ML based algorithm.
  Cut & paste all the examples you can come up to Claude and tell me if LLMs are not able to do this kind of merging.
jakelazaroff 7 months ago

If there are only two conflicting updates, sure, maybe that would work. But there might be more than two peers, and their updates might arrive in any order. That's why CRDT merges must be both commutative and associative.
Can we guarantee that an LLM will get exactly the same result merging (A ∨ B) ∨ C as it would merging A ∨ (B ∨ C)? Even when the temperature is 0?
- mweidner 7 months ago
  
  You could avoid the CRDT rules if you only use the LLM on the server. I.e., user comes online and sends their diff to the server, which LLM-merges it into the latest state and then sends that back to all clients.
  This doesn't help you do merges client-side during live collaboration (for showing your optimistic local updates), but there the low latency reduces conflicts anyway, so you can fall back on a semantically-imperfect CRDT.
  
  satvikpendem 7 months ago
  
  If you have a central server, you don't need CRDTs, which are designed to work even in pure peer-to-peer scenarios. Figma is one example of this [0]:
  > Figma isn't using true CRDTs though. CRDTs are designed for decentralized systems where there is no single central authority to decide what the final state should be. There is some unavoidable performance and memory overhead with doing this. Since Figma is centralized (our server is the central authority), we can simplify our system by removing this extra overhead and benefit from a faster and leaner implementation.
  > It’s also worth noting that Figma's data structure isn't a single CRDT. Instead it's inspired by multiple separate CRDTs and uses them in combination to create the final data structure that represents a Figma document (described below).
  [0] https://www.figma.com/blog/how-figmas-multiplayer-technology...
antics 7 months ago

Author here! This comment is kind of getting dragged here and elsewhere but I actually think it's not completely ridiculous. You can (and we have) presented git-style merge conflicts to an LLM and it will mostly fix them in ways that no algorithm can.
One example of this is if you and I both add a similar sentence in different spots in the document, asking an LLM to merge this will often result in only one of the sentences being accepted. It's not perfect but it's the kind of thing you can't get in any other technology!
With that all said I don't think LLMs REPLACE merge algorithms. For one, to get sensible output from the LLMs you generally need a diff of some kind, either git style or as the trace ouput of something like eg-walker.
satvikpendem 7 months ago

That's exactly how I see it too, true intelligent merging algorithms are, at the limit, going to be basically LLMs because only something that powerful can understand user intent in a way that non-ML algorithms cannot.

smoyer 7 months ago

I miss Google Wave.

worldsayshi 7 months ago

Oh! Now I remember the name of the tool that came before Google wave and (I think) inspired it.
Etherpad.

bvrmn 7 months ago

It's ironic how js-experts could not make examples work on the article page.

bee_rider 7 months ago

The example seems like it would be easier if we’d gone in the direction of allowing more complex commands, like vim does. Imagine if real editors had been developed for the last 30 years or however long, instead of stagnating at vim (which is clearly a nice text editor, but it could be nice to have an editor designed around writing prose). Maybe neovim will save us. Some day.

Bob’s intent is to edit the word color, and inserting a u. But, he is limited to just expressing “put u here,” which is not at all what he wants to achieve it is just a mechanical description of what operations need to occur.

Alice’s intent is to delete the whole sentence, but she’s similarly limited to just saying “delete delete deleted delete delete…” to a bunch of letters.

Ending up with a u is the obvious dumb result of treating language as a pile of characters. The correct behavior is to say: because the world Bob has edited no longer exists, his edit is clearly nonsense, don’t apply it. Which editor does that?

Rygian 7 months ago

Considering that software can only guess the intent if it's not declared explicitly, I'm curious what such an "intent declaration" language would look like.
- satvikpendem 7 months ago
  
  At the limit, it would probably look like an LLM, because it's akin to rules-based AI in the '90s vs neural network AI today. Expert systems had programmers write many rules in order to process information, which is what this "intent declaration" language would also be like, users writing many rules that would be followed. But this approach didn't work because even humans didn't even know all of the rules needed, therefore we turned to the statistical approaches of current neural network AI.
  
  Rygian 7 months ago
  
  We are aligned. Software cannot identify intentions.
  Heck, even person A cannot identify intentions of person B in a systematic way.
  Hence it's a non-solvable problem.
- bee_rider 7 months ago
  
  s/color/colour/g