HN Books @HNBooksMonth

The best books of Hacker News.

Hacker News Comments on
Database Internals: A Deep Dive into How Distributed Data Systems Work

Alex Petrov · 5 HN comments
HN Books has aggregated all Hacker News stories and comments that mention "Database Internals: A Deep Dive into How Distributed Data Systems Work" by Alex Petrov.
View on Amazon [↗]
HN Books may receive an affiliate commission when you make purchases on sites after clicking through links on this page.
Amazon Summary
When it comes to choosing, using, and maintaining a database, understanding its internals is essential. But with so many distributed databases and tools available today, it’s often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals. Throughout the book, you’ll explore relevant material gleaned from numerous books, papers, blog posts, and the source code of several open source databases. These resources are listed at the end of parts one and two. You’ll discover that the most significant distinctions among many modern databases reside in subsystems that determine how storage is organized and how data is distributed. This book examines: Storage engines: Explore storage classification and taxonomy, and dive into B-Tree-based and immutable Log Structured storage engines, with differences and use-cases for each Storage building blocks: Learn how database files are organized to build efficient storage, using auxiliary data structures such as Page Cache, Buffer Pool and Write-Ahead Log Distributed systems: Learn step-by-step how nodes and processes connect and build complex communication patterns Database clusters: Which consistency models are commonly used by modern databases and how distributed storage systems achieve consistency
HN Books Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this book.
I know two DB bibles which explain how a DB engine actually works. It's much more interesting reading than another SQL manual. But it's not awfully practical if you have no particular interest in DB technology. And both are pretty dense. * https://www.amazon.com/Database-Systems-Complete-Book-2nd/dp * https://www.amazon.com/Database-System-Concepts-Abraham-Silb...

There's a very cool not-quite-alternativE: https://leanpub.com/how-query-engines-work. It covers a fair chunk of DB technology but not storage. Definitely check out their repository at https://github.com/andygrove/how-query-engines-work/tree/mai... .

A companion to DDIA would be https://www.amazon.com/Database-Internals-Deep-Distributed-S... (especially its treatment of LSM trees which is harder to come by).

The book referenced on the blog post seems interesting:

Database Internals: A Deep Dive into How Distributed Data Systems Work https://www.amazon.com/_/dp/1492040347

Sure, I mostly went through this playlist https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_... (I haven't seen all the videos yet).

Also some parts of this book https://www.amazon.com/Database-Internals-Deep-Distributed-S... were very useful when working on the file structure.

I also just recently saw this project https://cstack.github.io/db_tutorial/ which builds a database in C. I have not gone through it but it seems like quite a good resource.

For the overall design the CMU playlist is the most helpful.

zd123
Thank you!!
+10 for this. I will elaborate further, hoping this gives you a good starting template.

- Programming: Learn two languages: Python and C

- Algorithms and Data Structures: Implement each data structure in the two languages above and implement a few algorithms of each type.

- Computer Architecture: For the referred excellent book, implement all assignments in any one language. Go head and burn the design on an FPGA, get the computer running on real hardware.

- OS: Having done ECS above, you should be in good shape to write your own OS: there is xv6, Xinu, Minix and many to choose from. Again have your OS running at least in a VM.

- Computer Networking: Write your own HTTP server in C.

- Math for CS: I would say focus on learning math essential for games, some linear algebra and leave it there. When you encounter a relevant field; AI or games, you should be in a position to pick up more math if required.

- Databases: Recently a book has been published on database internals, which is strongly recommended. Work through this book.

- Languages and Compilers: Learn a lisp, write a lisp interpreter (should introduce you to some FP concepts) and then working through Concepts, Techniques and Models of Computer Programming should be a good foundation.

Whether you are a student or working full time, these above are time consuming but well worth the ROI if you put in the effort. Be creative and ensure that you publish all your work as part of your portfolio. Good luck!!!

[1] https://www.amazon.com/Database-Internals-Deep-Distributed-S...

avremel
How would you compare Database Internals to Designing Data Intensive Applications?

[1] https://www.amazon.com/Designing-Data-Intensive-Applications...

deepaksurti
The book you refer to is really kind of system design for applications which handle large data volumes. OTOH, the book I refer to talks about how database software can be developed from ground up thus helping you understand the internals.
avremel
That book does cover many implemention details of a database. However, sometimes at a high level, and as you mention, specifically in the context of distributed systems.
8589934591
Thank you for your reply. It was very helpful. I will include your suggestions into my learning path.

I had difficulty implementing data structures in C, not in python. Python I was able to think in terms of classes and attributes. But I was finding it difficult to do the same in C since there is no concept of classes. I am still trying to learn pointers properly to have an understanding how to implement data structures and algorithms effectively.

I came across the book you have recommended and it is a very nice book. I would recommend that along with Designing Data Intensive Applications.

Thank you.

I have three things for you

1. Designing data intenstive applications

2. Database internals https://www.amazon.com/Database-Internals-deep-dive-distribu...

3. Andy Pavlo's database course videos at cmu and guest lecture series https://www.youtube.com/channel/UCHnBsf2rH-K7pn09rb3qvkA

HN Books is an independent project and is not operated by Y Combinator or Amazon.com.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.