Hacker News Comments on
ElixirConf 2016 - Keynote by José Valim
Confreaks
·
Youtube
·
9
HN points
·
2
HN comments
- This course is unranked · view top recommended courses
Hacker News Stories and Comments
All the comments and stories posted to Hacker News that reference this video.> In reality, most companies will have datasets that could be ETL'd on my iPhone and a Spark cluster is just overkill.I like that you said this and I've seen a similar sentiment before. I was first exposed to it watching an ElixirConf keynote from José Valim[1] about the Flow framework they built in Elixir where he summarized it as "For between 40-80% of the jobs submitted to MapReduce systems, you'd be better off running them on a single machine." which referenced the paper Musketeer: all for one, one for all in data processing systems [2].
While I'm no Data Engineer myself, I do often wonder if distributing the workload is always better? The anecdote above indicates that a powerful single multicore machine may be right solution for many.
Now that isn't to discount what Prophecy is trying to do, my company just went through a huge re-platforming moving from on on premises to the cloud and it is not easy; any company trying to tackle that space is on the right track. But I just wonder if its overkill for most use cases?
[1] - https://www.youtube.com/watch?v=srtMWzyqdp8 [2] - http://www.cs.utexas.edu/users/ncrooks/2015-eurosys-musketee...
The solution for this in Elixir is to use GenStage actors, which provides backpressure by allowing downstream processes to ask for data to process.Check out this video which describes it pretty well: https://youtu.be/srtMWzyqdp8?t=1027