Hacker News Comments on
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition
·
2
HN comments
- This course is unranked · view top recommended courses
Hacker News Stories and Comments
All the comments and stories posted to Hacker News that reference this book.Yup, exactly. Kimball's stuff is the best. You can achieve on a modest machine what modern techniques would require an incredibly expensive horizontally scaled MPP database. It does require a lot more planning and forethought to be certain.https://www.amazon.com/dp/product/1118530802 and possibly https://www.amazon.com/dp/0764567578/
Spark, etc, are great, but honestly if you're just getting started I would forget all about existing tooling that is geared towards people working at 300 person companies and I would read The Data Warehouse ETL Toolkit by Kimball:https://www.amazon.com/gp/product/1118530802/
I learned from the second edition, but I've heard even better things about the third. As you're working through it, create a project with real data and from-scratch re-implement a data warehouse as you go. It doesn't really matter what you tackle, but I personally like ETLing either data gather from web crawling a single site[0] or push in a weekly gathered wikipedia dump. You'll learn many of the foundational reasons for all the tools the industry uses, which will make it very easy for you to get up to speed on them and to make the right choices about when to introduce them. I personally tend to favour tools that have an API or CLI so I can coordinate tasks without needing to click around, but many others like a giant GUI so they can see data flows graphically. Most good tools have at least some measure of both.
[0] Use something like Scrapy for python (or Mechanize for ruby) with CSS selectors and use the extension Inspector Gadget to quickly generate CSS selectors.
⬐ dionidiumI interned for a data warehousing team when I was in college (a random assignment) and this is the book everybody there lived by and recommended.⬐ andyzwebsecond this and also The Data Warehouse Lifecycle Toolkit⬐ KagerjayI own all 3 Kimball books, they are fantastic⬐ mipmap04⬐ pacunaProbably the best live training I've ever attended for data warehousing.your link points to the data warehouse toolkit, not the ETL one.⬐ 3pt14159⬐ endlessvoid94Thanks, I linked to the right book, but I wrote the wrong title because I was originally going to recommend that one but changed my mind when I remembered what content was in what book.Agreed. This book will give you a fantastic way to think about ETL strategy rather than simply pointing you to the latest library.Some of the recent popular toolkits / services aren't "real" ETL -- they simply move data from one place to another. This is obviously a crucial part of ETL, but it's not the hard part. And without an understanding of data warehousing such as from this book, it will not be easy to discern the difference.
(This is based on many conversations with people on both sides of the table.)