Hacker News Comments on
AWS re:Invent 2018: Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321)
Amazon Web Services
·
Youtube
·
6
HN comments
- This course is unranked · view top recommended courses
Hacker News Stories and Comments
All the comments and stories posted to Hacker News that reference this video.This one is pretty good for DynamoDB: https://youtu.be/yvBR71D0nAQ
We don't have a paper on DynamoDB's internals (yet?), but here's a talk you might find interesting from one of the folks who built and ran DDB for a long time: https://www.youtube.com/watch?v=yvBR71D0nAQAnd Doug Terry talking through the details of how DynamoDB's transaction protocol works: https://www.usenix.org/conference/fast19/presentation/terry
If we did publish more about the internals of DDB, what would you be looking to learn? Architecture? Operational experience? Developer experience? There's a lot of material we could share, and it's useful to hear where people would like us to focus.
⬐ pow_pp_-1_vAll of it - architecture, operational experience, best practices etc.⬐ ldrndllJust want to second this. All of the above sounds really interesting to me!
If people really want to know how DynamoDB works, this is a good tech talk: https://www.youtube.com/watch?v=yvBR71D0nAQ
Here are a couple videos from reInvent 2018:Jaso talking about DynamoDB internals https://www.youtube.com/watch?v=yvBR71D0nAQ
Marc talking about Lambda internals https://www.youtube.com/watch?v=QdzV04T_kec
Saying DynamoDB is built on top of InnoDB is a pretty big oversimplification of a much more complex distributed system[1] and for all we know they could have switched out the low level the storage engine on the backend to something like RocksDB or WiredTiger.The Aurora storage subsystem is much more limited in terms of horizontal scalability and performance, they probably chose it because it was a better/quicker fit.
⬐ evil-oliveYeah, I used to work on DynamoDB, I know it's more complicated (much more complicated than that video makes out - their code quality was atrocious, like 2000-5000 line Java classes in 3 or 4 deep inheritance hierarchies; no unit tests, only "smoke tests" that took 2 hours to run and were so prone to race conditions that common advice was to close everything else on your machine, run them, then leave them alone while you went to meetings)There was work underway at the time I left to replace InnoDB with WiredTiger. It seemed to be very slow going, and I suspect WiredTiger being acquired by 10gen had a part in it. They also had only 1-2 engineers on the project of ripping out MySQL and replacing it, in a long-lived branch that constantly dealt with merge conflicts from more active feature development happening on mainline.
Aurora, simply by virtue of being newer and learning from DDB's mistakes (in the same way DDB learned from SimpleDB and the original Dynamo) probably has better extension points for supporting (MySQL, Postgres, Mongo) in a sane way.
⬐ talawahdotnetInteresting, how long ago was that? I would be curious to know if the WiredTiger switch ever happened, and what that support relationship looks like not given the contentious relationship between MongoDB and AWS. The old Wired Tiger Inc website[1] still lists AWS as a customer.Then again, the relationship between AWS and Oracle is even more contentious and Aurora MySQL is one of AWS's most popular products so I don't think they are terribly worried about building on competitor's technologies.
⬐ evil-olive3+ years ago, so it's entirely possible that things have changed since I left. I don't have any more recent information on the state of the system.At least when I was there, the strong focus was always on adding new features (global & local secondary indexes, change streams, cross-region replication, and so on) to keep up with the Joneses (MongoDB et al).
Meanwhile, a bunch of internal Amazon teams were taking a dependency on it instead of being their own DBAs, and those teams didn't care that much about the whiz-bang features, they just wanted a reliable scale-out datastore that someone else would get paged about when some component failed.
Adding features at a breakneck pace while keeping up umpteen-nines reliability and handful-of-milliseconds performance meant tech debt and non-user-facing improvements, including WiredTiger, all got sidelined. Around the time I left, our page load was around 200 per week. That's one page every 50 minutes, 24/7, if you're keeping score at home.
⬐ talawahtechGiven the scale and popularity of DynamoDB and the distributed nature you would think that they could hire multiple teams just to work on improving it, but I guess it isn't as simple as that.I would love to get a behind the scenes look at the process of gradually improving the components of DynamoDB with better technologies, while still maintaining reliability and performance.
⬐ manigandhamAccording to this post [1] the WiredTiger project seems to have been cancelled after the acquisition.
DynamoDB is different from their published paper, which is mostly about designing a highly available key/value system that can be run on top of any other datastore. The paper does mention MySQL as a storage option.The recent MySQL info comes from this 2016 thread on DynamoDB storing empty strings: https://news.ycombinator.com/item?id=13170746
Confirmed by this comment specifically: https://news.ycombinator.com/item?id=13173927
The latest 2018 ReInvent deep dive on DynamoDB doesn't reveal anything but still fits if mysql is powering the storage nodes: https://www.youtube.com/watch?v=yvBR71D0nAQ