HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
AWS re:Invent 2019: Beyond eleven nines: Lessons from Amazon S3 culture of durability (STG331-R1)

AWS Events · Youtube · 5 HN points · 2 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention AWS Events's video "AWS re:Invent 2019: Beyond eleven nines: Lessons from Amazon S3 culture of durability (STG331-R1)".
Youtube Summary
Amazon S3 is well known for being designed for eleven nines of data durability. But durability is much more than a formula and a metric. It influences every aspect of how we design, build, deploy, and operate Amazon S3. In this session, learn about some of the practices that Amazon S3 applies under the hood to achieve "durability in depth," and how they can benefit your software systems.
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
> I think the biggest takeaway here is system-wide operations are bad for system-wide reliability.

Yes!

System-wide operations, whether they are human-driven operations ("ssh onto that box"), control-plane operations ("remove all the failed servers"), DI operations ("deploy the new code"), or even basic algorithmic things like replication ("put the same state onto all the servers") are the top causes of correlation that I've seen in the wild. Whether or not this matters to you depends a lot on what you're building, and how often you can tolerate failures. But if you're building something that needs high availability, durability, integrity, etc it's worth paying a huge amount of attention to the things that can introduce correlation in your systems.

If you're interested in reading more beyond what Joe (the OP) talks about some methods of avoiding those in the article (he's a colleague of mine at AWS):

* Our "Millions of Tiny Databases" paper goes into a lot of detail on another AWS take on reducing correlated failure (https://www.usenix.org/conference/nsdi20/presentation/brooke...). * Some AWS folks from the S3 team also touch on correlation in this talk: https://www.youtube.com/watch?v=DzRyrvUF-C0&t=2410s * I've written in the past about the role of software deployments in correlated failure (https://brooker.co.za/blog/2022/01/31/deployments.html), and about how to think about the role of redundancy (https://brooker.co.za/blog/2021/04/14/redundancy.html).

It's not really about the design of S3, but if you're interested in some of the philosophy and thinking behind S3 you might enjoy "Beyond eleven nines: Lessons from Amazon S3 culture of durability" https://www.youtube.com/watch?v=DzRyrvUF-C0
Dec 12, 2019 · 5 points, 0 comments · submitted by sethwm
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.