HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Timely dataflow in three easy steps!

Frank McSherry · Youtube · 140 HN points · 1 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Frank McSherry's video "Timely dataflow in three easy steps!".
Youtube Summary
A 15 minute introduction to timely dataflow, intended for people who do not have twenty minutes.

Timely dataflow is a modern data-parallel dataflow framework, but from the outside it can appear challenging. This video attempts to call out the most interesting departures from standard dataflow systems, through a running worked example and extensive use of bright colors. By the end, I hope you have a sense for what makes timely dataflow unique, and how you can learn more if your curiosity is piqued.
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Here's a 15-minute introduction to Timely Dataflow by Frank, our co-founder: https://www.youtube.com/watch?v=yOnPmVf4YWo
Apr 09, 2020 · 136 points, 17 comments · submitted by mpweiher
dmos62
There's a few relevant repositories here https://github.com/TimelyDataflow including two rust implementations.
FridgeSeal
The readme for the Abomonation repo makes me laugh every time.
rustybolt
In my opinion dataflow is the only true representation of computations. Unlike normal code, it represents dependencies and parallel computation perfectly. Because of this, it is also a great basis for a hardware implementation.
BubRoss
Graphs are good for seeing dependencies and ordering, but not great for branching and looping.
DonaldFisk
Although in conventional languages with control flow, we're more used to how they're done, both branching and looping can be done straightforwardly in dataflow without introducing any non-dataflow constructs - just graphs with vertices connected by edges.

Branching: http://www.fmjlang.co.uk/fmj/tutorials/Conditional.html

Looping: http://www.fmjlang.co.uk/fmj/tutorials/Iteration.html

Emit and collect: http://www.fmjlang.co.uk/fmj/tutorials/Macros.html

These show the basic idea. There are many other examples throughout the tutorials (http://www.fmjlang.co.uk/fmj/tutorials/TOC.html).

BTW, in case anyone's wondering why there have been no recent updates to the pages on my visual dataflow language, it's because the many improvements I've been making, particularly big changes to the type system, have required a lot more work than I expected. I haven't abandoned work on it, but it will still be some time before it's ready for release.

BubRoss
Those show that it's possible, not necessarily that it's a good interface to make those parts of programs. Houdini's shader language has had branching done in a data flow graph for a long time. Touch designer has a kind of looping construct too. You might want to take a look at these domain specific interfaces if you are doing your own graph, they are well done.

Fundamentally though, the density of text expressions exceeds a data flow graph by a huge margin. If what is being done isn't fundamentally a directed acyclic graph, visualizing it with a graph becomes more difficult to absorb than the expressions as text.

throwaway8291
I looked at the data flow paradigm a couple of years ago. Back then I thought that the difference to just a "ordinary" functions is not that big, and for performance (which is important for my data work), you do not want to deviate from the traditional way too much.

Anyone felt the same or can provide a real-world problem, where data flow is actually working better that other solutions?

j-pb
You should watch the video. It's not really about the dataflow programming paradigm itself.

This is about timely dataflow, the foundation of differential dataflow. It allows for the efficient incremental computation of results.

It basically solves the entire view maintenance problem from databases in a very elegant and efficient way.

BubRoss
If your functions are transforming chunks of data into other formats/types, you are already doing what data flow graphs are doing. Generalizing can give much more structured concurrency.
FridgeSeal
There’s this: https://github.com/mit-pdos/noria

It’s like a cache, except it keeps itself in sync with the database automatically and generates “materialised views” using data-flows based on the queries that get asked of it and will automatically generate new ones if someone makes a query it doesn’t already have a data flow for. Parts of data flows can also be shared across views.

The paper linked in the github goes into detail about the performance gains, but it easily outperforms straight database calls and caching setups.

thereyougo
So many talented teachers out there. I'm glad you shared this video. This guy deserves more views to his videos
FridgeSeal
His papers are fascinating as well. The COST paper especially changed how I thought about a lot of problems.
arendtio
Somehow the 'Hello World' example reminds me of

  $ printf 'Hello World' | awk '{print $2}' | tr '[A-Z]' '[a-z]' | wc -w | cat
Just kidding ;-)
andyferris
This video is really cool. I’ve been following dataflow approaches for a while, including some of Frank McSherry’s (usually enjoyable) articles. None of the comments mention https://materialize.io so I may as well (an open source commercial offering based off these concepts).

Watching this explanation I’m slightly curious whether things like materialize and noria are a bit limited in that this could be a paradigm for an actual functional reactive programming language rather than specifically a “data” thing. It appears to have the structure of nested contexts (loops, scopes, etc) advocated by structured programming (ie “goto considered harmful”). It can reliably calculate an answer at each point in time for each state of input, concurrently and with parallelism. Even if there are multiple inputs with their own notion of time (not covered in the video). That’s, like, the holy grail of PLT these days, isn’t it? Or am I missing something?

pritambaral
> ... an open source commercial offering ...

I just looked at their license[1] and it doesn't appear to be open source at the moment.

https://github.com/MaterializeInc/materialize/blob/master/LI...

benesch
Yes, we consider ourselves an “open core” company. Timely and differential, the core compute engine, are fully open source projects, but the Materialize layer atop is licensed under the “Business Source License” (BSL).

We think the BSL strikes a good balance between giving back to the community—four years after every release, the code is automatically relicensed under Apache 2—and ensuring we can build a viable business. And you’re free to use Materialize for any purpose in a non-distributed (i.e., single node) deployment without paying for an enterprise license.

pritambaral
Wasn't complaining against your model; just correcting the parent's usage of the terms.
Feb 27, 2020 · 4 points, 0 comments · submitted by mpweiher
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.