HN Books @HNBooksMonth

The best books of Hacker News.

Hacker News Comments on
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition

Ralph Kimball · 2 HN comments
HN Books has aggregated all Hacker News stories and comments that mention "The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition" by Ralph Kimball.
View on Amazon [↗]
HN Books may receive an affiliate commission when you make purchases on sites after clicking through links on this page.
Amazon Summary
Updated new edition of Ralph Kimball's groundbreaking book on dimensional modeling for data warehousing and business intelligence! The first edition of Ralph Kimball's The Data Warehouse Toolkit introduced the industry to dimensional modeling,and now his books are considered the most authoritative guides in this space. This new third edition is a complete library of updated dimensional modeling techniques, the most comprehensive collection ever. It covers new and enhanced star schema dimensional modeling patterns, adds two new chapters on ETL techniques, includes new and expanded business matrices for 12 case studies, and more. Authored by Ralph Kimball and Margy Ross, known worldwide as educators, consultants, and influential thought leaders in data warehousing and business intelligence Begins with fundamental design recommendations and progresses through increasingly complex scenarios Presents unique modeling techniques for business applications such as inventory management, procurement, invoicing, accounting,customer relationship management, big data analytics, and more Draws real-world case studies from a variety of industries,including retail sales, financial services, telecommunications,education, health care, insurance, e-commerce, and more Design dimensional databases that are easy to understand and provide fast query response with The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition.
HN Books Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this book.
Yup, exactly. Kimball's stuff is the best. You can achieve on a modest machine what modern techniques would require an incredibly expensive horizontally scaled MPP database. It does require a lot more planning and forethought to be certain.

https://www.amazon.com/dp/product/1118530802 and possibly https://www.amazon.com/dp/0764567578/

Spark, etc, are great, but honestly if you're just getting started I would forget all about existing tooling that is geared towards people working at 300 person companies and I would read The Data Warehouse ETL Toolkit by Kimball:

https://www.amazon.com/gp/product/1118530802/

I learned from the second edition, but I've heard even better things about the third. As you're working through it, create a project with real data and from-scratch re-implement a data warehouse as you go. It doesn't really matter what you tackle, but I personally like ETLing either data gather from web crawling a single site[0] or push in a weekly gathered wikipedia dump. You'll learn many of the foundational reasons for all the tools the industry uses, which will make it very easy for you to get up to speed on them and to make the right choices about when to introduce them. I personally tend to favour tools that have an API or CLI so I can coordinate tasks without needing to click around, but many others like a giant GUI so they can see data flows graphically. Most good tools have at least some measure of both.

[0] Use something like Scrapy for python (or Mechanize for ruby) with CSS selectors and use the extension Inspector Gadget to quickly generate CSS selectors.

dionidium
I interned for a data warehousing team when I was in college (a random assignment) and this is the book everybody there lived by and recommended.
andyzweb
second this and also The Data Warehouse Lifecycle Toolkit
Kagerjay
I own all 3 Kimball books, they are fantastic
mipmap04
Probably the best live training I've ever attended for data warehousing.
pacuna
your link points to the data warehouse toolkit, not the ETL one.
3pt14159
Thanks, I linked to the right book, but I wrote the wrong title because I was originally going to recommend that one but changed my mind when I remembered what content was in what book.
endlessvoid94
Agreed. This book will give you a fantastic way to think about ETL strategy rather than simply pointing you to the latest library.

Some of the recent popular toolkits / services aren't "real" ETL -- they simply move data from one place to another. This is obviously a crucial part of ETL, but it's not the hard part. And without an understanding of data warehousing such as from this book, it will not be easy to discern the difference.

(This is based on many conversations with people on both sides of the table.)

HN Books is an independent project and is not operated by Y Combinator or Amazon.com.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.