HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
01 - Course Introduction & Relational Model (CMU Databases Systems / Fall 2019)

CMU Database Group · Youtube · 253 HN points · 4 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention CMU Database Group's video "01 - Course Introduction & Relational Model (CMU Databases Systems / Fall 2019)".
Youtube Summary
Prof. Andy Pavlo (http://www.cs.cmu.edu/~pavlo/)
Slides: https://15445.courses.cs.cmu.edu/fall2019/slides/01-introduction.pdf
Notes https://15445.courses.cs.cmu.edu/fall2019/notes/01-introduction.pdf

15-445/645 Intro to Database Systems (Fall 2019)
Carnegie Mellon University
https://15445.courses.cs.cmu.edu/fall2019/
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Obligatory recommendation for the CMU Databases Systems lecture series by Andy Pavlo. https://www.youtube.com/watch?v=oeYBdghaIjc&list=PLSE8ODhjZX...

Here's the lecture on tree indexes: https://www.youtube.com/watch?v=JHZFc4hMGhk

Agreed. I finished watching them at the start of this year and it's significantly helped me in using and making choices about databases. I've also since started implementing my own database for learning off the back of them. Couldn't be more grateful for those courses being public.

link if anyone is interested: https://youtu.be/oeYBdghaIjc

I'd start with CMU's "Intro to Database Systems", their lectures are on youtube. Highly recommended both for the depth and how Andy Pavlo presents the topic. https://www.youtube.com/watch?v=oeYBdghaIjc&list=PLSE8ODhjZX...
wolfoftheweb
What do you recommend for aggregating and scraping the data? I’ve been working with PyCharm and BeautifulSoup4.

Also, any suggestions for the best ways to apply the data to a website if the data is being refreshed daily? I’ve been using csv files to pass the variables into a Wordpress theme / post but it seems like building something from scratch would be more efficient in the long term.

skovorodkin
This amazing course is not about using database systems, but about making them.
Oct 22, 2019 · 245 points, 27 comments · submitted by adamnemecek
teddyh
First sentence:

“First I want to talk about how Oracle is helping us out this semester with course development.”

Closes tab

CalChris
He doesn't mention IBM System R at all.
lmwnshn
He does. [0]

[0] https://15445.courses.cs.cmu.edu/fall2019/slides/02-advanced... slide 3

HappyJoy
Do so at your own loss. Great lectures.
simplegeek
He is a great teacher. And this looks like a great course. The project, which is to create a database in C, looks most interesting to me.

I really don't have a strong background in C so I just wish I could create the project in Python since that's what I mostly use. Does anyone know a database course which uses Python to create a database?

gregorygoc
Both languages are Turing complete so their language of choice should be transparent to anyone taking the course.
lepetitpedre
I love that he explains what's really under the hood. It's needed in a world where the trend is the opposite.
vs2
He is giving the course from the bathtub in his hotel
abledon
that's how you know its legit.
graovic
Yes, why not? It seems to works fine
efa
Better than from the toilet
perennate
Last time he pretended to get pepper sprayed: https://www.youtube.com/watch?v=m72mt4VN9ik&t=540s

Edit: fake pepper spray @ 13:00

pipu
Andy Pavlo's video lectures are very good. Watched both introductory and advanced lectures a year or so ago. Recommended. Previous videos may be found from the channel.
barbecue_sauce
Some of the best and most detailed information on database internals available on YouTube. Also great at integrating the current state of the commercial database industry with trends in academic database research.
ramboldio
Watch Lecture 3. The one where they introduce the 'course DJ' (!) https://youtu.be/1D81vXw2T_w
ramboldio
They have a "course DJ" starting from lecture #3 ! https://youtu.be/1D81vXw2T_w?t=25
albinary10
Excellent course!

I have some difficulties to follow because of missing prerequisites anyway I like this type of course.

MikeRyu
Yes, coming from a professor who is full of himself http://www.cs.cmu.edu/~pavlo/blog/2016/04/should-you-email-a...
cozos
I took a database class in college and it mostly taught SQL syntax (which IMO can be learned from StackOverflow and such) and all the different intricacies of database normalization (3rd normal form, etc).

Having worked as a programmer for the past couple years I can't help but feel that the stuff I learned at college isn't very relevant and I wish that I knew more about the actual database internals; query execution strategies, concurrency control and stuff like that. This seems like a good place to start.

dmitryminkovsky
This is a great book/website https://use-the-index-luke.com/
gigatexal
This is why I am greatful I spent my first five years as a DBA: learning SQL was just a tool but why things worked or didn’t work, why queries were fast or slow, how to best define a schema for a given use-case (that’s key — schema design should fit the business use-case not some arbitrary “beautiful” theoretical model)
joker3
The database course I took covered that a bit. We used https://www.amazon.com/Database-Management-Systems-Raghu-Ram.... It's not the most up-to-date text (although it was when I was in school), but it's probably still the best source for the basics of relational database theory and implementation.
ivanjaros
they are not that complicated. you just need a format of storing records where you need to store size of value and value itself in byte form so you can read it back and then it is ALL about indices. and that is really it. you can try to build a db by yourself from scratch of use some key-value db as base. the magic is usualy not in storage or indexing but in the query processing and parsing and optimization. that is essentially what makes dbs different. and then, of course, how they handle transactions. in my experience a simple write lock(mutex) is the standard, though you can get a bit advance and use more graular locking so your transactions, if they do not overlap, are way faster and not stop-the-world fast.
mr_overalls
> they are not that complicated

Yes, when you're only interested in creating a toy/proof-of-concept that doesn't consider: feature-completeness, optimization, concurrency, security, performance, transaction management, stability, etc.

I suppose you could say that _any_ complex product. "Google Search is just a simple web search box, connected to a text search engine with a little bit of linear algebra on the back end. It's really not that complicated."

oneepic
I've been told the best way to learn those concepts is to literally build a database. Not on your own, necessarily...I think there are guided tutorials for that sort of thing.
bytematic
Well it's a CS class not a Software Engineering class
roenxi
Naming them "Database" courses is a bit of a trick; they usually don't actually study databases. They study data, models of data, algorithms where accessing data is very costly and a little bit of concurrency.

This is all useful and it does make sense to think about it at the same time, but in practice:

* Concurrency requires a lot more focus than as a side effect of one course

* Cost of disk accesses isn't necessarily important in database practice, eg, if using an in-memory database.

I think database courses are fantastic and I enjoy talking about models of data, but anyone who goes into one thinking they are going to learn about actual databases has been misled. Even the SQL syntax is only really being taught as an example of how the relational data model can be insubstantiated - 'learning SQL' isn't academically interesting.

If it were me, I'd break most intro-to-databases up into 3 - one course on concurrency, one course on data models and fold the algorithms into an existing algorithms course.

atomicity
Yeah, I also agree that databases are a lot more about database internals (or DBMSs).

Not sure if I would agree with splitting databases into separate courses. I think it's a lot more useful to learn about general concepts like concurrency and memory/storage management by understanding how other systems (like OSs) handle it. For concurrency, systems courses should give students a taste of the various concurrency techniques. Then the more theoretical courses like parallel computing can give a more unified and mathematical view.

I think the current state of courses already meshes pretty well:

Real Systems| Theory of Computer Systems

============|

Networks ⇘

--------

Operating Systems ⇒ Concurrency/Parallel/Distributed Computing

--------

Databases ⇗⇘

-------------- Programming Languages

Compilers ⇒⇗

My main gripe about databases courses is that they are often way too out-of-date:

* No mention of LSM trees, which are probably a bit more important than ISAM.

* Tons of time spent on 2PL, deadlocks, and strict serializability, without any mention that mainstream database systems generally default to Read Committed and use MVCC.

* ARIES - At least the course I took spent so much time on outdated cost optimization, concurrency control techniques and then just mention a single, very complicated yet important durability technique at the end. This isn't even that young of a technique anymore (1992) and there are tons of variants which are probably a bit more important to understand than all of the 2PL variants.

lmwnshn
The previous offering of this course had students implement some of ARIES [0].

[0] https://15445.courses.cs.cmu.edu/fall2018/project4/

atomicity
This looks awesome! This covers all of my complaints and even has some core material that I don't have but probably should have 'cached' in the back of my mind.

Props to Pavlo and the course TAs for the great work! A lot of DB courses don't really highlight how dynamic the storage & database systems area is right now but this course does.

Oct 02, 2019 · 8 points, 0 comments · submitted by adamnemecek
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.