Hacker News Comments on
Big Data: Principles and best practices of scalable realtime data systems
·
2
HN comments
- This course is unranked · view top recommended courses
Hacker News Stories and Comments
All the comments and stories posted to Hacker News that reference this book.Lambda architecture for data processing, as popularized by Nathan Marz et al [0], has two components, the Batch layer and the Stream layer. At a high level, Batch trades quality for staleness whilst Stream optimises for freshness at the expense of quality [1].I believe what GP means by Lambda is that, you'd need a system that batch processes the data to be amended / changed (reprocess older data) but stream processes whatever that's required for real-time [2].
An alternative is the Kappa architecture proposed initially by Jay Kreps [3][4], co-creator of Apache Kafka.
---
[0] https://www.amazon.com/dp/1617290343
[1] https://en.wikipedia.org/wiki/Lambda_architecture
[2] https://speakerdeck.com/druidio/real-time-analytics-with-ope...
[3] https://engineering.linkedin.com/distributed-systems/log-wha...
⬐ thekhatribharatHere's a related article: https://medium.com/open-factory/state-of-the-m-art-big-data-...An excerpt from the article:
Furthermore, the big data tools can be combined using a growing number of data processing architectures — Lambda and Kappa, among others.
⬐ battery_cowboyThanks so much for the comment, it was very helpful!⬐ sologoubThe sources are good and thorough, but very long. Here’s an ok summary of kappa proposal: https://milinda.pathirage.org/kappa-architecture.com/In theory this sounds great, but you have to account for processing capacity.
While compute is getting cheaper, one of the key reasons streaming in lambda sacrifices quality over throughput is compute capacity (as well as timing). If you have to feed already stored data through the same streaming pipe, you either have to have a lot of excess capacity, be willing to pay for that additional burst or accept latency in your results (assuming you can keep up with your incoming workload and not lose data). There is no free lunch.
You probably might want to read this (for free): http://book.mixu.net/distsys/single-page.htmlAnd pay a little to read this book: http://www.amazon.com/Designing-Data-Intensive-Applications-...
And this one: http://www.amazon.com/Big-Data-Principles-practices-scalable...
Nathan Marz brought Apache Storm to the world, and Martin Kleppmann is pretty well known for his work on Kafka.
Both are very good books on building scalable data processing systems.