Hacker News Comments on
01 - Course Introduction & Relational Model (CMU Databases Systems / Fall 2019)
CMU Database Group
·
Youtube
·
253
HN points
·
4
HN comments
- This course is unranked · view top recommended courses
Hacker News Stories and Comments
All the comments and stories posted to Hacker News that reference this video.Obligatory recommendation for the CMU Databases Systems lecture series by Andy Pavlo. https://www.youtube.com/watch?v=oeYBdghaIjc&list=PLSE8ODhjZX...Here's the lecture on tree indexes: https://www.youtube.com/watch?v=JHZFc4hMGhk
Agreed. I finished watching them at the start of this year and it's significantly helped me in using and making choices about databases. I've also since started implementing my own database for learning off the back of them. Couldn't be more grateful for those courses being public.link if anyone is interested: https://youtu.be/oeYBdghaIjc
I assume you are refering to this course https://www.youtube.com/watch?v=oeYBdghaIjc&list=PLSE8ODhjZX...and the follow up advanced course https://www.youtube.com/watch?v=SdW5RKUboKc&list=PLSE8ODhjZX...
I'd start with CMU's "Intro to Database Systems", their lectures are on youtube. Highly recommended both for the depth and how Andy Pavlo presents the topic. https://www.youtube.com/watch?v=oeYBdghaIjc&list=PLSE8ODhjZX...
⬐ wolfofthewebWhat do you recommend for aggregating and scraping the data? I’ve been working with PyCharm and BeautifulSoup4.Also, any suggestions for the best ways to apply the data to a website if the data is being refreshed daily? I’ve been using csv files to pass the variables into a Wordpress theme / post but it seems like building something from scratch would be more efficient in the long term.
⬐ skovorodkinThis amazing course is not about using database systems, but about making them.
⬐ teddyhFirst sentence:“First I want to talk about how Oracle is helping us out this semester with course development.”
Closes tab
⬐ CalChris⬐ simplegeekHe doesn't mention IBM System R at all.⬐ lmwnshn⬐ HappyJoyHe does. [0][0] https://15445.courses.cs.cmu.edu/fall2019/slides/02-advanced... slide 3
Do so at your own loss. Great lectures.He is a great teacher. And this looks like a great course. The project, which is to create a database in C, looks most interesting to me.I really don't have a strong background in C so I just wish I could create the project in Python since that's what I mostly use. Does anyone know a database course which uses Python to create a database?
⬐ gregorygoc⬐ lepetitpedreBoth languages are Turing complete so their language of choice should be transparent to anyone taking the course.I love that he explains what's really under the hood. It's needed in a world where the trend is the opposite.⬐ vs2He is giving the course from the bathtub in his hotel⬐ abledon⬐ piputhat's how you know its legit.⬐ graovicYes, why not? It seems to works fine⬐ efaBetter than from the toilet⬐ perennateLast time he pretended to get pepper sprayed: https://www.youtube.com/watch?v=m72mt4VN9ik&t=540sEdit: fake pepper spray @ 13:00
Andy Pavlo's video lectures are very good. Watched both introductory and advanced lectures a year or so ago. Recommended. Previous videos may be found from the channel.⬐ barbecue_sauce⬐ ramboldioSome of the best and most detailed information on database internals available on YouTube. Also great at integrating the current state of the commercial database industry with trends in academic database research.Watch Lecture 3. The one where they introduce the 'course DJ' (!) https://youtu.be/1D81vXw2T_w⬐ ramboldioThey have a "course DJ" starting from lecture #3 ! https://youtu.be/1D81vXw2T_w?t=25⬐ albinary10Excellent course!I have some difficulties to follow because of missing prerequisites anyway I like this type of course.
⬐ MikeRyuYes, coming from a professor who is full of himself http://www.cs.cmu.edu/~pavlo/blog/2016/04/should-you-email-a...⬐ cozosI took a database class in college and it mostly taught SQL syntax (which IMO can be learned from StackOverflow and such) and all the different intricacies of database normalization (3rd normal form, etc).Having worked as a programmer for the past couple years I can't help but feel that the stuff I learned at college isn't very relevant and I wish that I knew more about the actual database internals; query execution strategies, concurrency control and stuff like that. This seems like a good place to start.
⬐ dmitryminkovskyThis is a great book/website https://use-the-index-luke.com/⬐ gigatexalThis is why I am greatful I spent my first five years as a DBA: learning SQL was just a tool but why things worked or didn’t work, why queries were fast or slow, how to best define a schema for a given use-case (that’s key — schema design should fit the business use-case not some arbitrary “beautiful” theoretical model)⬐ joker3The database course I took covered that a bit. We used https://www.amazon.com/Database-Management-Systems-Raghu-Ram.... It's not the most up-to-date text (although it was when I was in school), but it's probably still the best source for the basics of relational database theory and implementation.⬐ ivanjarosthey are not that complicated. you just need a format of storing records where you need to store size of value and value itself in byte form so you can read it back and then it is ALL about indices. and that is really it. you can try to build a db by yourself from scratch of use some key-value db as base. the magic is usualy not in storage or indexing but in the query processing and parsing and optimization. that is essentially what makes dbs different. and then, of course, how they handle transactions. in my experience a simple write lock(mutex) is the standard, though you can get a bit advance and use more graular locking so your transactions, if they do not overlap, are way faster and not stop-the-world fast.⬐ mr_overalls⬐ oneepic> they are not that complicatedYes, when you're only interested in creating a toy/proof-of-concept that doesn't consider: feature-completeness, optimization, concurrency, security, performance, transaction management, stability, etc.
I suppose you could say that _any_ complex product. "Google Search is just a simple web search box, connected to a text search engine with a little bit of linear algebra on the back end. It's really not that complicated."
I've been told the best way to learn those concepts is to literally build a database. Not on your own, necessarily...I think there are guided tutorials for that sort of thing.⬐ bytematicWell it's a CS class not a Software Engineering class⬐ roenxiNaming them "Database" courses is a bit of a trick; they usually don't actually study databases. They study data, models of data, algorithms where accessing data is very costly and a little bit of concurrency.This is all useful and it does make sense to think about it at the same time, but in practice:
* Concurrency requires a lot more focus than as a side effect of one course
* Cost of disk accesses isn't necessarily important in database practice, eg, if using an in-memory database.
I think database courses are fantastic and I enjoy talking about models of data, but anyone who goes into one thinking they are going to learn about actual databases has been misled. Even the SQL syntax is only really being taught as an example of how the relational data model can be insubstantiated - 'learning SQL' isn't academically interesting.
If it were me, I'd break most intro-to-databases up into 3 - one course on concurrency, one course on data models and fold the algorithms into an existing algorithms course.
⬐ atomicityYeah, I also agree that databases are a lot more about database internals (or DBMSs).Not sure if I would agree with splitting databases into separate courses. I think it's a lot more useful to learn about general concepts like concurrency and memory/storage management by understanding how other systems (like OSs) handle it. For concurrency, systems courses should give students a taste of the various concurrency techniques. Then the more theoretical courses like parallel computing can give a more unified and mathematical view.
I think the current state of courses already meshes pretty well:
Real Systems| Theory of Computer Systems
============|
Networks ⇘
--------
Operating Systems ⇒ Concurrency/Parallel/Distributed Computing
--------
Databases ⇗⇘
-------------- Programming Languages
Compilers ⇒⇗
My main gripe about databases courses is that they are often way too out-of-date:
* No mention of LSM trees, which are probably a bit more important than ISAM.
* Tons of time spent on 2PL, deadlocks, and strict serializability, without any mention that mainstream database systems generally default to Read Committed and use MVCC.
* ARIES - At least the course I took spent so much time on outdated cost optimization, concurrency control techniques and then just mention a single, very complicated yet important durability technique at the end. This isn't even that young of a technique anymore (1992) and there are tons of variants which are probably a bit more important to understand than all of the 2PL variants.
⬐ lmwnshnThe previous offering of this course had students implement some of ARIES [0].⬐ atomicityThis looks awesome! This covers all of my complaints and even has some core material that I don't have but probably should have 'cached' in the back of my mind.Props to Pavlo and the course TAs for the great work! A lot of DB courses don't really highlight how dynamic the storage & database systems area is right now but this course does.