HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
AWS re:Invent 2018: Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321)

Amazon Web Services · Youtube · 6 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Amazon Web Services's video "AWS re:Invent 2018: Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321)".
Youtube Summary
Come to this session to learn how Amazon DynamoDB was built as the hyper-scale database for internet-scale applications. In January 2012, Amazon launched DynamoDB, a cloud-based NoSQL database service designed from the ground up to support extreme scale, with the security, availability, performance, and manageability needed to run mission-critical workloads. This session discloses for the first time the underpinnings of DynamoDB, and how we run a fully managed nonrelational database used by more than 100,000 customers. We cover the underlying technical aspects of how an application works with DynamoDB for authentication, metadata, storage nodes, streams, backup, and global replication.
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Jan 20, 2022 · schwarzmx on DynamoDB 10 years later
This one is pretty good for DynamoDB: https://youtu.be/yvBR71D0nAQ
Jan 20, 2022 · mjb on DynamoDB 10 years later
We don't have a paper on DynamoDB's internals (yet?), but here's a talk you might find interesting from one of the folks who built and ran DDB for a long time: https://www.youtube.com/watch?v=yvBR71D0nAQ

And Doug Terry talking through the details of how DynamoDB's transaction protocol works: https://www.usenix.org/conference/fast19/presentation/terry

If we did publish more about the internals of DDB, what would you be looking to learn? Architecture? Operational experience? Developer experience? There's a lot of material we could share, and it's useful to hear where people would like us to focus.

pow_pp_-1_v
All of it - architecture, operational experience, best practices etc.
ldrndll
Just want to second this. All of the above sounds really interesting to me!
If people really want to know how DynamoDB works, this is a good tech talk: https://www.youtube.com/watch?v=yvBR71D0nAQ
Here are a couple videos from reInvent 2018:

Jaso talking about DynamoDB internals https://www.youtube.com/watch?v=yvBR71D0nAQ

Marc talking about Lambda internals https://www.youtube.com/watch?v=QdzV04T_kec

Saying DynamoDB is built on top of InnoDB is a pretty big oversimplification of a much more complex distributed system[1] and for all we know they could have switched out the low level the storage engine on the backend to something like RocksDB or WiredTiger.

The Aurora storage subsystem is much more limited in terms of horizontal scalability and performance, they probably chose it because it was a better/quicker fit.

1. https://youtu.be/yvBR71D0nAQ

evil-olive
Yeah, I used to work on DynamoDB, I know it's more complicated (much more complicated than that video makes out - their code quality was atrocious, like 2000-5000 line Java classes in 3 or 4 deep inheritance hierarchies; no unit tests, only "smoke tests" that took 2 hours to run and were so prone to race conditions that common advice was to close everything else on your machine, run them, then leave them alone while you went to meetings)

There was work underway at the time I left to replace InnoDB with WiredTiger. It seemed to be very slow going, and I suspect WiredTiger being acquired by 10gen had a part in it. They also had only 1-2 engineers on the project of ripping out MySQL and replacing it, in a long-lived branch that constantly dealt with merge conflicts from more active feature development happening on mainline.

Aurora, simply by virtue of being newer and learning from DDB's mistakes (in the same way DDB learned from SimpleDB and the original Dynamo) probably has better extension points for supporting (MySQL, Postgres, Mongo) in a sane way.

talawahdotnet
Interesting, how long ago was that? I would be curious to know if the WiredTiger switch ever happened, and what that support relationship looks like not given the contentious relationship between MongoDB and AWS. The old Wired Tiger Inc website[1] still lists AWS as a customer.

Then again, the relationship between AWS and Oracle is even more contentious and Aurora MySQL is one of AWS's most popular products so I don't think they are terribly worried about building on competitor's technologies.

1. http://www.wiredtiger.com/

evil-olive
3+ years ago, so it's entirely possible that things have changed since I left. I don't have any more recent information on the state of the system.

At least when I was there, the strong focus was always on adding new features (global & local secondary indexes, change streams, cross-region replication, and so on) to keep up with the Joneses (MongoDB et al).

Meanwhile, a bunch of internal Amazon teams were taking a dependency on it instead of being their own DBAs, and those teams didn't care that much about the whiz-bang features, they just wanted a reliable scale-out datastore that someone else would get paged about when some component failed.

Adding features at a breakneck pace while keeping up umpteen-nines reliability and handful-of-milliseconds performance meant tech debt and non-user-facing improvements, including WiredTiger, all got sidelined. Around the time I left, our page load was around 200 per week. That's one page every 50 minutes, 24/7, if you're keeping score at home.

talawahtech
Given the scale and popularity of DynamoDB and the distributed nature you would think that they could hire multiple teams just to work on improving it, but I guess it isn't as simple as that.

I would love to get a behind the scenes look at the process of gradually improving the components of DynamoDB with better technologies, while still maintaining reliability and performance.

manigandham
According to this post [1] the WiredTiger project seems to have been cancelled after the acquisition.

https://news.ycombinator.com/item?id=13170746#13173927

DynamoDB is different from their published paper, which is mostly about designing a highly available key/value system that can be run on top of any other datastore. The paper does mention MySQL as a storage option.

The recent MySQL info comes from this 2016 thread on DynamoDB storing empty strings: https://news.ycombinator.com/item?id=13170746

Confirmed by this comment specifically: https://news.ycombinator.com/item?id=13173927

The latest 2018 ReInvent deep dive on DynamoDB doesn't reveal anything but still fits if mysql is powering the storage nodes: https://www.youtube.com/watch?v=yvBR71D0nAQ

HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.