HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Conflict Resolution for Eventual Consistency • Martin Kleppmann • GOTO 2016

GOTO Conferences · Youtube · 1 HN points · 15 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention GOTO Conferences's video "Conflict Resolution for Eventual Consistency • Martin Kleppmann • GOTO 2016".
Youtube Summary
This presentation was recorded at GOTO Berlin 2016
http://gotober.com

Martin Kleppmann - Researcher at University of Cambridge

ABSTRACT
What do collaborative editors like Google Docs, the calendar app on your phone, and multi-datacenter database clusters have in common?
Answer: They all need to cope with network interruptions, and still work offline. They all allow state to be updated [...]

Download slides and read the full abstract here:
https://gotocon.com/berlin-2016/presentations/show_talk.jsp?oid=7910

RECOMMENDED BOOKS
Martin Kleppmann • Designing Data-Intensive Applications • https://amzn.to/3mk2Roj

https://twitter.com/gotober
https://www.facebook.com/GOTOConference
http://gotocon.com
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Started, but never finished. :(

@nmaro has some pretty good stuff.

The cartoon explainers gives an overview.

But you'd probably want to do RGA algorithm or one of the newer ones.

Martin Kleppmann has a pretty good talk with bunch of paper references:

https://youtu.be/yCcWpzY8dIA

josemaenzo
Oh, I see. Just wondering, How GUN handle a situation like described above when working with Arrays?
He gave several talks on CRDTs which are good intros:

https://www.youtube.com/watch?v=B5NULPSiOGw

https://www.youtube.com/watch?v=yCcWpzY8dIA

lichtenberger
Yes, I saw them. Also his book is one of the best computer science books I've read and he gave me a really good advise for my own Open Source project, regarding the use of BookKeeper as a distributed log (instead of Kafka) :-)
> Indeed, the literature of CRDT does specify a mathematically correct answer. But this does not always line up with what humans would find the most faithful rendering of intent.

This is a very salient point that anyone thinking of using CRDTs to "solve" synchronization in an user-facing application needs to take into consideration. Yes, CRDTs will guarantee that clients converge to an identical, mathematically "consistent" state eventually, but there's no guarantee whether or not that mathematically "consistent" state would make any sense to the application business logic that needs to consume that state, or to the human that needs to reason about the rendered result. That is a completely different can of worms that we'll still have to tackle to build a usable application.

Here's great example to illustrate this from Martin Kleppmann's talk on this topic: https://youtu.be/yCcWpzY8dIA?t=2634

Rest of the talk is also highly recommended for anyone interested in an approachable primer on CRDTs.

The trade-offs to CRDTs mentioned by the author in the context of text-editors make sense, but I would be curious to hear from the Xray team on what their current thinking on the topic is, given that they have collaborative editing as an explicit core objective (which might shift the value prop in favor of using CRDTs relatively speaking since in Xi it seems to be only an aspirational goal), and that their approach to implementation was similar but not quite identical to Xi's:

> Our use of a CRDT is similar to the Xi editor, but the approach we're exploring is somewhat different. Our current understanding is that in Xi, the buffer is stored in a rope data structure, then a secondary layer is used to incorporate edits. In Xray, the fundamental storage structure of all text is itself a CRDT. It's similar to Xi's rope in that it uses a copy-on-write B-tree to index all inserted fragments, but it does not require any secondary system for incorporating edits.

https://github.com/atom/xray#text-is-stored-in-a-copy-on-wri...

cryptonector
This really is the most important thing to get from all of this. In an editor the users can see any bad outcomes and correct them. In, say, a database system, bad outcomes can be much more problematic.
the_duke
> but I would be curious to hear from the Xray team on what their current thinking on the topic is

Xray is dead:

https://www.reddit.com/r/rust/comments/bdf3lx/we_need_to_sav...

For anyone who wants a quick primer on OT / CRDT text algorithms:

https://gun.eco/explainers/school/class.html ( Cartoon Explainer! )

For a detailed deep dive, including RGA:

https://www.youtube.com/watch?v=yCcWpzY8dIA

TLDR:

Local-first software is powered by CRDTs.

If you want to learn more about CRDTs, check out:

- https://github.com/automerge/automerge (author's project, legit)

- https://www.youtube.com/watch?v=yCcWpzY8dIA (deep technical talk)

- https://gun.eco/distributed/matters.html (my Cartoon Explainer)

josephg
OT also works fine for this sort of stuff. OT algorithms are easier to implement ([1] for an implementation I wrote of OT over arbitrary JSON structures). OT just requires a central source of truth / a central authority. For local first stuff depending on how you design it you can have one of those - in the form of your server.

[1] https://github.com/josephg/json1

zzzcpan
OT are inferior to CRDTs in every single way. In 2019 people shouldn't even be looking at OT.
josephg
I disagree. I think OT systems are way simpler to reason about, because they’re just an extension of event sourcing. Also CRDTs type implementations have been trailing OT algorithms in terms of features forever. OT got JSON editing support first, and JSON1 (the OT algorithm I linked above) also supports arbitrary tree reparenting, which as far as I know is missing from all CRDT algorithms. That’s needed to implement apps like workflowy, where you can drag trees around.

CRDT algorithms have documents which grow without bound. With OT, the documents are always minimal and it’s easy to reason about (and implement) trimming operations.

CRDTs are a better tool for distributed applications, but for server client stuff OT works fine.

zzzcpan
Well, CRDTs were born to simplify reasoning. You can't really claim that OT is simpler. And performance of both is implementation specific with similar theoretical bounds. But, of course, CRDTs are much more general and foundational.
marknadal
:) I got arbitrary tree reparenting and mutable (and immutable!) state working in our CRDT, for about 4 years now. No unbounded growth!

Runs in production, having done 1TB p2p data in a day, on $99 hardware! Internet Archive and others use it.

I do agree tho, most CRDT implementations have just as many scaling and compaction problems as any append-only log system.

OT is worthwhile to understand.

josephg
Oh cool! GitHub link?
marknadal
https://github.com/amark/gun

var a = {b: 1, c: {d: 2}, e: 3, f: {g: 4}};

a.c.z = a;

Every object has its own UUID, which makes circular references & sub-objects easy to reference.

a.c = (UUID pointer to c)

a.c.z = (UUID pointer to a)

a.f = (UUID pointer to f)

Now we can

(a.f.z = a.c) && (a.c = null)

c doesn't actually move, just the pointers on `a` and on f.

c now has a new parent (or could have 2 parents, since any graph is allowed).

Since everything is represented on disk/wire as a flat graph, all updates can be well defined as operations on `UUID.property = primitive/pointer` atomic changes in the CRDT.

This means there are only 7 operations to commute: (I hate switch statements, but someone helping me formalize the CRDT wanted it expressed this way)

https://jsbin.com/hedeqoxusa/edit?js,console (click run to see each operation applied)

You see order-of-operation doesn't matter.

Once merge has happened, history does not need to be preserved (no unbounded growth!) but it can if you want log/history.

Merged states can be stored as a flat graph on disk with a radix tree which allows for O(1) lookups on UUID+property pairs regardless of graph size.

There are some caveats though, of course:

Strongly Eventually Consistent, so don't use for banking.

Counter operations still grow-only but can be done in 12 lines ontop (see https://gun.eco/docs/Counter ). Rich text also grow-only.

Happy to expand on anything else, too!

canadaduane
Keep up the awesome work! I've been watching gun on the sidelines, and look forward to its eventual domination ;)
marknadal
Your work originally inspired me to get into this stuff. :)

<3 thanks so much for your amazing contributions to OSS.

CRDTs are great, and I've seen a lot of quality in Christopher's work.

A few other resources for those interested in CRDT subject:

- Our cartoon explainer https://gun.eco/distributed/matters.html

- An excellent talk by Martin Kleppmann https://youtu.be/yCcWpzY8dIA

- Marc Shapiro's in depth talk https://www.youtube.com/watch?v=oyUHd894w18

CRDTs are quite actively discussed in the IPFS community, in particular here: https://github.com/ipfs/research-CRDT/ (lots of great resources if you're interested in CRDTs)

Also check out automerge, a CRDT that aims to be as JSON-like as possible: https://github.com/automerge/automerge

Martin Kleppmann (one of the creators of Automerge) has given several great in-depth talks about it, eg. https://www.youtube.com/watch?v=yCcWpzY8dIA

sagichmal
I was not able to actually find the Javascript implementation of the data type in that repo. I have tried to implement from first principles (i.e. the paper) in my language, but large parts seem to be left under- or un-specified. Did I miss something? I would love to be wrong about this...
lachenmayer
I haven't dug deep into the implementation, but most of the interesting CRDT stuff seems to be in the "backend" directory in that repo: https://github.com/automerge/automerge/tree/master/backend

In particular the "operations" defined in [1] section 4.2 are defined in this file: https://github.com/automerge/automerge/blob/master/backend/o...

[1] https://www.cl.cam.ac.uk/~arb33/papers/KleppmannBeresford-CR...

Yes, we did a whole interactive explainer article on this:

http://gun.js.org/explainers/school/class.html (cartoon version)

Although, from a algorithm side, you may enjoy a talk by Martin (a CRDT researcher) for a more thorough technical overview:

https://youtu.be/yCcWpzY8dIA?t=29m36s

Our team has a model for applying end-to-end encryption to this as well. It isn't a high priority for us to do it ourselves though, so if you are interested in getting involved, we can help explain all the algorithms for implementation!

brokenmachine
I've seen that class.html explainer.

It's a great tutorial but it's a long way from that high-level explanation of how it should work to a functional, working google docs clone!

I know you've already done a huge amount of work but enduser acceptance is what separates what you're doing from all the previous academic protocols out there. Nobody is going to switch from google docs to gundb if they are expected to write all the code themselves. OK, a small percentage, but not many.

Has anyone made anything similar that an enduser can just install and use? Even just a collaborative plain text editor would be a great start.

I would be interested in helping but my javascript skills are rudimentary at best.

I've been interested in dat/hypercore lately. Is there any way that gundb could piggyback on hypercore for storage, or in your opinion would the fact that it's an append-only log be too limiting for gundb?

Neil Fraser went on to apply this work at Google with Google Docs (Operational Transformation, see other comment here).

He was also one of my heroes that inspired my work to build a decentralized (and offline-first) version, that I later applied to graphs with https://github.com/amark/gun .

Some good follow ups are:

- Martin Kleppmann's https://youtu.be/yCcWpzY8dIA?t=29m36s

- Cartoon explainer http://gun.js.org/explainers/school/class.html

May 04, 2018 · 1 points, 0 comments · submitted by amzans
Cool concept, but note:

I just tested this, and it is NOT a google docs alternative - it has no realtime sync (or at least, it wasn't working).

I work extensively on decentralized/P2P distributed systems and realtime P2P sync is non-trivial, you need to use a specialized CRDT for it:

- Here is an interactive animated explainer on how to do a P2P realtime syncing CRDT: http://gun.js.org/explainers/school/class.html

- Here is an excellent more detailed explanation of how such a CRDT works, by Martin Kleppmann: https://www.youtube.com/watch?v=yCcWpzY8dIA&feature=youtu.be...

brokenmachine
Just checked out gun, I have to say the tutorials are very well done. I'm not a javascript guy but I've been enjoying them.

Looks like a great project!

marknadal
Thanks, I super appreciate that. We just released some new ones the other day (not sure if those are the ones you peeked at or not)! Definitely join the chat room, bunch of friendly people there and not JS-only (somebody just started a Python port that is working), would love to chat. :)
brokenmachine
Thanks a lot! I might take you up on that. I was planning to learn a bit of Python anyway.

I love what you're doing. I hate the way corporations are monetizing our private data. People should be able to have both privacy and convenience.

I would be happy to pay for an Evernote/Google Docs clone with an Android app where everything is client-side encrypted and possibly self-hosted so our data remains ours. What I want should be pretty simple really, but there's always the trust issue so I might have to write it myself!

jhunter1016
You're absolutely correct. In this iteration, collaboration is achieved asynchronously. Share a file, receiver can edit, receiver can share back, etc.

Because real-time collaboration is not trivial, as you mentioned, we're making sure we do it right.

marknadal
Would love to help you out on that algorithm, I (and a bunch of other friendly P2P people) hang out at https://gitter.im/amark/gun !

I know it isn't as catchy as Google Docs, but perhaps having the title be "Evernote alternative" or something like that? (IDK if evernote has realtime typing, I doubt they do... or somebody similar that doesn't), so that way people don't get dissapointed?

jhunter1016
I sincerely appreciate that. Joining the Gitter room now :)
I've spent almost a decade working on CRDTs, largely from an industry perspective but also some academic side.

They're great, and not too hard to understand if you take time to think about them.

Here are some animated/interactive cartoon explainers we've made to help teach the concepts:

- How a CRDT version of Operational Transformation (Google Docs) works: http://gun.js.org/explainers/school/class.html

- Why CRDTs are better than centralized alternatives like PAXOS/RAFT: http://gun.js.org/distributed/matters.html

Martin Kleppmann's work (author of Automerge) is outstanding, especially check out: https://youtu.be/yCcWpzY8dIA?t=29m36s

justinpombrio
I have a rather specific question, in case you know. What would be the best CRDT algorithm for editing a document consisting of (i) nested lists, (ii) strings, and (iii) some primitive values?

I'm looking at Kleppermann's JSON work. But it doesn't look like it handles collaborative editing of strings very well: they have to either be treated as an immutable value in a register, which doesn't allow for collaborative editing of the string, or treated as a list of characters, which would have to represent large edits (e.g. deleting a word) as a sequence of character edits, which sounds inefficient. Kleppermann's work also handles maps, which I don't need, which is fine.

For context, I plan to make a tree/structure editor, and am considering representing the document as a CRDT.

rollcat
(not an expert)

Treating the long string like a list of substrings should work, I guess the optimum substring length could be determined experimentally based on real-world interaction patterns.

Hey, David/others I'm very impressed with your work on Y-JS (the CRDT library powering this). I'm the author of gun, which is currently (as far as I know from stats) the most popular CRDT-based data sync (MIT/ZLIB/Apache2 licensed, 6.6K+ stars, https://github.com/amark/gun ).

Usually I'm pretty skeptical of most systems, but I'm rather particularly impressed with yours. I also do not think you are getting the recognition you deserve, because the truly novel work here is Y-JS not IPFS (even though is IPFS is great), yet Y-JS is almost just a footnote on the page which is sad.

The first couple documents (on architecture and stuff) didn't seem to explore the CRDT that is used. I'll have to follow-through on other sub-documents/pages to read more, but wanted to get your thoughts first (and BTW, thanks for all the documentation you have already provided, it is impressive, and I know how hard it is to try and get everything shipped, so the last thing I want to be is an annoying HN person whining about why you didn't already have X or Y or Z finished, and wanted to give you a chance to reply directly).

I was able to rather quickly ( https://www.dropbox.com/s/8uxzq6d47szu4w1/Screenshot%202017-... ) get the document to go out of sync, but I understand this could just be a trivial implementation detail (and often is) NOT a problem with the CRDT, but wanted to check. Given your bachelor thesis, I know you are all too aware of these things, P2P collaborative rich-text is one of the hardest problems out there in the space. Your Y-JS system seems fairly "generalizable", no? Which is unique amongst CRDTs (ours at gun is as well, so I get the nuance). If that is true, usually P2P rich-text then requires a specialized CRDT on top of the generalizable one, is this what you all did?

For instance, Martin Kleppmann did an excellent explanation of explaining theirs here: https://youtu.be/yCcWpzY8dIA?t=16m26s . I was wondering if you did something similar?

We had to implement our own, which I did one of my infamous "cartoon explainers" on here: http://gun.js.org/explainers/school/class.html . However Martin actually brings up an edge case I hadn't thought of after 4 years of analyzing the system! But thankfully he provides a patch on how to fix it. ;)

Anyways, wanted to again repeat I'm very impressed with Y-JS (and say that it is not getting the attention it deserves for this being on top of HN, given that it is what is novel, not the other stuff). But I also wanted to inquire more about the generalizable-ness of Y-JS and then hear about any specific specialized-CRDT you may be using for P2P rich text, because it looks like it could be a breakthrough! Would love to hear more! Excellent work.

pgte
Hi Mark,

Thanks for your interest!

Off the CRDT libraries we analysed, Y.js was the one we picked up as it very modular, so that we could create our own connector and database layer.

These are all open-source: https://github.com/ipfs-shipyard/y-ipfs-connector and the encryption layer wrapping the database adaptor: https://github.com/pgte/y-indexeddb-encrypted/tree/encrypted

About the richtextdetails, the CRDT is based off of Quill.js deltas (https://github.com/quilljs/delta), composed by an array of such operations.

Anyway, this is all interim work, and we plan on making this independent of any specific library by implementing a generic CRDT on top of IPFS DAG and Pubsub APIs. If you're interested you can follow / chime in here: https://github.com/ipfs/research-CRDT/issues/11

daviddias
Hi Mark,

Congratz on developing Gun! I've looked at it before and I was super impressed! The decision to use Y.js instead was done due to its modularity. We added IPFS support by simply publishing a connector https://github.com/ipfs-shipyard/y-ipfs-connector which Kevin Jahns later added to the list of connectors - https://github.com/y-js/yjs/issues/77

We have linked Y.js from the page header, if you click the word CRDT it takes you to Y.js directly. Nevertheless, I agree that we need to make it more obvious How we appreciate all the work that Kevin has put into Y.js :) Do not worry, PeerPad nor our CRDT usage is finished, there will be a lot more! (In fact, PeerPad was built with little less than 4 weeks of actual dev time, there is so much more we can do).

I'm following up by email, I'm interested in testing Gun as well. Meanwhile, join the CRDT discussion on the Research CRDT (https://github.com/ipfs/research-crdt) Repo, would love to have your input there.

Cheers!

gritzko
It's good you welcome contributions to your open platform. In case your next demo is based on my work, please notify me in advance.
https://youtu.be/yCcWpzY8dIA?list=LL9RQYUX_WIjk8SAcyuarJEw About 25 minutes in. And it is called OT, so I stand corrected. It does require a server though. I was confusing it with CRDT.
Is this website related to the popular http://highscalability.com/ ?

Holy cow, this is an intense article, with great definitions for a bunch of detailed variations of distribute system jargon, and even has some nice images to explain the ideas!

My specialty is in eventually consistent data types, which there section is much smaller on, but they have some links to follow up on CRDT stuff.

> are often limited in functionality and impose performance overheads

This is for the most part true, with CRDTs. However we just formalized and proposed in a new paper, with some Stanford colleagues, how to construct a generalizable CRDT. One that actually lets other more case-specific CRDTs be built on top of it, which we use at gunDB, so that way you can get the advantages of the more optimized CRDTs but still have the rest of your data glued together.

Another set of really interesting CRDTs which they didn't mention, are the ones used for making decentralized versions of Google Doc:

- Martin's Kleppmann has an exceptionally great/interesting talk on this: https://www.youtube.com/watch?v=yCcWpzY8dIA&feature=youtu.be...

- And we recently did a "layman" animated explainer of a similar algorithm, that explains it for the non-academically minded, with simple analogies to "distributed systems" that happen in every day life: http://gun.js.org/explainers/school/class.html

I'll definitely be referencing the OP's article in the future though. Even though I'm in this world, I constantly get the vernacular mixed up. And this will be a really great dictionary to use.

HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.