Hacker News Comments on
Timely dataflow in three easy steps!
Frank McSherry
·
Youtube
·
140
HN points
·
1
HN comments
- This course is unranked · view top recommended courses
Hacker News Stories and Comments
All the comments and stories posted to Hacker News that reference this video.Here's a 15-minute introduction to Timely Dataflow by Frank, our co-founder: https://www.youtube.com/watch?v=yOnPmVf4YWo
⬐ dmos62There's a few relevant repositories here https://github.com/TimelyDataflow including two rust implementations.⬐ FridgeSeal⬐ rustyboltThe readme for the Abomonation repo makes me laugh every time.In my opinion dataflow is the only true representation of computations. Unlike normal code, it represents dependencies and parallel computation perfectly. Because of this, it is also a great basis for a hardware implementation.⬐ BubRoss⬐ throwaway8291Graphs are good for seeing dependencies and ordering, but not great for branching and looping.⬐ DonaldFiskAlthough in conventional languages with control flow, we're more used to how they're done, both branching and looping can be done straightforwardly in dataflow without introducing any non-dataflow constructs - just graphs with vertices connected by edges.Branching: http://www.fmjlang.co.uk/fmj/tutorials/Conditional.html
Looping: http://www.fmjlang.co.uk/fmj/tutorials/Iteration.html
Emit and collect: http://www.fmjlang.co.uk/fmj/tutorials/Macros.html
These show the basic idea. There are many other examples throughout the tutorials (http://www.fmjlang.co.uk/fmj/tutorials/TOC.html).
BTW, in case anyone's wondering why there have been no recent updates to the pages on my visual dataflow language, it's because the many improvements I've been making, particularly big changes to the type system, have required a lot more work than I expected. I haven't abandoned work on it, but it will still be some time before it's ready for release.
⬐ BubRossThose show that it's possible, not necessarily that it's a good interface to make those parts of programs. Houdini's shader language has had branching done in a data flow graph for a long time. Touch designer has a kind of looping construct too. You might want to take a look at these domain specific interfaces if you are doing your own graph, they are well done.Fundamentally though, the density of text expressions exceeds a data flow graph by a huge margin. If what is being done isn't fundamentally a directed acyclic graph, visualizing it with a graph becomes more difficult to absorb than the expressions as text.
I looked at the data flow paradigm a couple of years ago. Back then I thought that the difference to just a "ordinary" functions is not that big, and for performance (which is important for my data work), you do not want to deviate from the traditional way too much.Anyone felt the same or can provide a real-world problem, where data flow is actually working better that other solutions?
⬐ j-pb⬐ thereyougoYou should watch the video. It's not really about the dataflow programming paradigm itself.This is about timely dataflow, the foundation of differential dataflow. It allows for the efficient incremental computation of results.
It basically solves the entire view maintenance problem from databases in a very elegant and efficient way.
⬐ BubRossIf your functions are transforming chunks of data into other formats/types, you are already doing what data flow graphs are doing. Generalizing can give much more structured concurrency.⬐ FridgeSealThere’s this: https://github.com/mit-pdos/noriaIt’s like a cache, except it keeps itself in sync with the database automatically and generates “materialised views” using data-flows based on the queries that get asked of it and will automatically generate new ones if someone makes a query it doesn’t already have a data flow for. Parts of data flows can also be shared across views.
The paper linked in the github goes into detail about the performance gains, but it easily outperforms straight database calls and caching setups.
So many talented teachers out there. I'm glad you shared this video. This guy deserves more views to his videos⬐ FridgeSeal⬐ arendtioHis papers are fascinating as well. The COST paper especially changed how I thought about a lot of problems.Somehow the 'Hello World' example reminds me ofJust kidding ;-)$ printf 'Hello World' | awk '{print $2}' | tr '[A-Z]' '[a-z]' | wc -w | cat
⬐ andyferrisThis video is really cool. I’ve been following dataflow approaches for a while, including some of Frank McSherry’s (usually enjoyable) articles. None of the comments mention https://materialize.io so I may as well (an open source commercial offering based off these concepts).Watching this explanation I’m slightly curious whether things like materialize and noria are a bit limited in that this could be a paradigm for an actual functional reactive programming language rather than specifically a “data” thing. It appears to have the structure of nested contexts (loops, scopes, etc) advocated by structured programming (ie “goto considered harmful”). It can reliably calculate an answer at each point in time for each state of input, concurrently and with parallelism. Even if there are multiple inputs with their own notion of time (not covered in the video). That’s, like, the holy grail of PLT these days, isn’t it? Or am I missing something?
⬐ pritambaral> ... an open source commercial offering ...I just looked at their license[1] and it doesn't appear to be open source at the moment.
https://github.com/MaterializeInc/materialize/blob/master/LI...
⬐ beneschYes, we consider ourselves an “open core” company. Timely and differential, the core compute engine, are fully open source projects, but the Materialize layer atop is licensed under the “Business Source License” (BSL).We think the BSL strikes a good balance between giving back to the community—four years after every release, the code is automatically relicensed under Apache 2—and ensuring we can build a viable business. And you’re free to use Materialize for any purpose in a non-distributed (i.e., single node) deployment without paying for an enterprise license.
⬐ pritambaralWasn't complaining against your model; just correcting the parent's usage of the terms.