HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
How Modern SQL Databases Come up with Algorithms that You Would Have Never Dreamed Of by Lukas Eder

Devoxx · Youtube · 293 HN points · 5 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Devoxx's video "How Modern SQL Databases Come up with Algorithms that You Would Have Never Dreamed Of by Lukas Eder".
Youtube Summary
SQL is the only ever successful, mainstream, and general-purpose 4GL (Fourth Generation Programming Language) and it is awesome!

With modern cost based optimisation, relational databases like Oracle, SQL Server, PostgreSQL finally keep up to the promise of a powerful declarative programming model by adapting to ever changing productive data without performance penalties. Thousand-line-long, complex SQL statements can be run in far below a millisecond against billion-row strong tables if database developers know their ways around the SQL language - and the best news is: It's not that hard!

In this talk, I'll show how the SQL database will constantly outperform any hand written data retrieval algorithm - or in other words - how SQL, being a logic language, is the best language for business logic.
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Aug 23, 2022 · lukaseder on Parsing SQL
>Totally agree. Lukas Eder has some nice presentations about it.

Thanks for the shout out! For the record, that's probably the referenced talk: https://www.youtube.com/watch?v=wTPGW1PNy_Y

I would suggest that learning SQL (the syntax) isn't the crux--it's just syntax you learn over time based off need, just like learning python--what you need is motivation. I'd suggest watching this video to get a flavor of how much leverage using the database can give you: https://www.youtube.com/watch?v=wTPGW1PNy_Y If the database can make you more productive and more efficient and make your life easier, you now have incentive to use it more.

Another thing that took my use of databases to the next level, was using PostGraphile (https://www.graphile.org/postgraphile/). It takes a PostgreSQL database and provides a GraphQL endpoint for it. Use that as your only backend. You can use PostgreSQL row-level security to do access control (who can see or modify what). You can use PL/pgSQL to write business logic for higher level calculations. You can use constraints and triggers to do data validation. Each of these areas has an initially steep learning curve, and wonky syntax, but once you get over them you can implement many backends extremely quickly with them. And using PostGraphile where you only have SQL available (well, you can also add javascript extensions, but try to limit those) will be a great forcing function that will require you to scour stackoverflow and the excellent PostgreSQL docs to learn many of the useful corners of SQL you'd never learn otherwise.

rmbyrro
Interesting, didn't know about PostGraphile. Having recently learned about EdgeDB [1] launch here in HN [2], seems the latter is going to each the former's lunch.

[1] https://www.edgedb.com/

[2] https://news.ycombinator.com/item?id=30290225

nefitty
Thank you for writing this. This is one of the most helpful comments I've read in my 7 years on HN.

A lot of this high-level engineering stuff feels so opaque. You outlined a clear heuristic: find a motivation to learn the tool, add artificial limits if necessary for your learning goals, then scout relevant resources to help you get unstuck. It seems simple when written out, but I always find myself trying to overload on uncontextualized information first, then feel confused about why I still don't know how to "make a neural net" or what have you.

I use to think that too. This is a good talk about how smart SQL query engines are. It is an hour though https://youtu.be/wTPGW1PNy_Y
> further, afaik there's no assignment nor iteration in SQL

That was until CTEs were introduced in SQL:1999

> it's not a programming language at all. It's a.. query language.

The two are not mutually exclusive. SQL is used to tell a computer what to do, and it is very powerful at it: https://www.youtube.com/watch?v=wTPGW1PNy_Y

May 11, 2019 · jodrellblank on Joins In Steps
This talk ("How Modern SQL Databases Come up with Algorithms that You Would Have Never Dreamed Of by Lukas Eder" https://www.youtube.com/watch?v=wTPGW1PNy_Y) is not particularly gripping, but he does claim that modern database engines will ignore indexes if they estimate it will be faster to read all the data than it is to access the index first then read the relevant data. And that Oracle can have multiple execution plans and switch between them mid-query if the first one isn't working out as expected.

Could this be a rare case of the mythical sufficiently smart compiler having more information at runtime than the programmer has at coding time, and that if you could hint which index to use, the trade off would be that you'd instead be troubleshooting queries which end up slower because of it?

kec
> Could this be a rare case of the mythical sufficiently smart compiler having more information at runtime than the programmer has at coding time [...]

Mythical? Isn't this the core concept behind optimizing JITs?

Apr 06, 2019 · 292 points, 121 comments · submitted by truth_seeker
ThJ
This talk didn't hook me, but then again, it looks like it's aimed at developers who jumped on the NoSQL bandwagon and now need to be told why SQL is clever.

I feel that much of what we do with map, reduce, sort and filter on arrays in modern JS is similar to SQL. Unfortunately, there is no query optimisation, because that would break your procedural code.

What I'd like to see more of is domain-specific languages embedded in general-purpose languages, like React with JSX, or just a better bridge between the two. We have these two worlds of in-memory data structures on one side, and databases on the other, that we keep separate. NoSQL was an attempt at bringing them together, but it put far too little emphasis on complex queries and filters.

It would be nice if we could just define data structures and their relations once and not have to do it again. It would be nice if foreign keys neatly mapped to object references in OOP and everything stayed live, so if you change a reference, that happens instantly in the database too, unless you wrap it in a transaction block.

Now pull this off while still retaining the ability to open an SQL REPL to test out statements, or manipulate field definitions.

I think that's what everyone wants, really. A seamless way for code to interact with data on all levels, with no perceptible boundary once you've opened a session to your DB server.

dustingetz
How about something like this? http://www.hyperfiddle.net/
None
None
jstimpfle
> map, reduce, sort and filter on arrays in modern JS is similar to SQL

maybe that, but then again, what we do with custom data structures that are required for certain things to be halfway efficient, is much harder or impossible to do in SQL, correct?

lugg
Look up some of the old YouTube vids on datomic / rich Hickey.

"Database as a value" had me jaw dropping and then crying most of the way through..

dustingetz
+1 for "Database as a Value", this talk changed my world
rgoulter
At around 22:10, the speaker summarises equivalencies between common SQL clauses (SELECT, GROUP BY, etc.) and what Java 8 calls a "stream API" (map, collect, etc.). https://blog.jooq.org/2015/08/13/common-sql-clauses-and-thei...

Neat. I hadn't thought of it that way before.

miguelrochefort
In C#, the LINQ operators on IEnumerable and IObservable are named after the SQL causes (Select, Where, GroupBy, OrderBy, Join, Union, etc).
Falell
More than that, as part of LINQ they built a whole "expression" engine into the language to make LINQ to SQL more powerful than simple method chains could be. What looks like function calls is actually converted into expression tree at the language level before conversion to SQL statements when communicating with a DB backend.

The method-chains-vs-expression-tree semantics don't matter much when doing typical in-memory map/filter/reduce on lists but make LINQ to SQL and other more complex use cases much more powerful.

jayrwren
When .NET introduced LINQ this is 90% of what the .NET world talked about for the next 3yrs.
mjburgess
I teach most (dynamic) functional programming constructs by starting with SQL.

Python comprehensions, etc. are mostly just SQL -- likewise most languages where "map" is part of the collections API.

SQL motivates the utility of functional programming for data-oriented applications.

collyw
Have you got any material online to look at? I am curious to see the sort of stuff you are talking about ("Python comprehensions, etc. are mostly just SQL" doesn't make a lot of sense to me right now. I see them as an inline for loop).
mjburgess
ages = [ row['age'] for row in table ]
afandian
I wouldn't compare a comprehension to a loop because a loop doesn't typically have an output. If a loop does produce an output it's through side effects. List comprehensions are the same as maps, and generator comprehension even more so, as they can compose functions not values.

And a map can be the same as a projection as found in sql.

dragonwriter
> I wouldn't compare a comprehension to a loop because a loop doesn't typically have an output.

Loops in lots of languages have outputs. It's kind of necessary in expression-oriented languages.

But it's true that comprehensions are different than Python loops in that way (and are, in fact, equivalent to maps with joins and filters, like an SQL SELECT statement.)

afandian
Per my other reply [0] maybe my definition of loop, based on C, was too narrow.

I draw a distinction with map, recursion, lazy sequences etc

[0] https://news.ycombinator.com/item?id=19591233

taeric
Loops in lisp would like a word with you. :)

The difference is that many languages have constructs that weren't value producing. It is why some have the ternary operator, but lisp just had if statements.

So, to that end, most loops do produce a value. And there are several common ways they do it. In many languages, you have to give the details of how they work. In some, you can only those details and the building if the output is a bit more declarative.

afandian
I was coming from the Lisp perspective! Or at least Clojure...

Maybe it's a language issue. I would say a 'loop' has a jump and a condition. So that's 'for', 'while' and 'do' in C, JS, Python, Pascal, Java...

Lots of languages including Lisp have recursion, maps etc but I wouldn't call that looping. Clojure even has TCO recursion but it only returns one value, not a sequence (unless you accumulate the whole thing)

taeric
But even the LOOP macro has returned values for a long time. It isn't even hard to see how the new stream/collect API if Java is just approaching what LOOP has been doing for decades.
afandian
OK agreed there's a construct called loop that returns a single value. My original point was that list comprehensions are closer to sql projections than loops.
taeric
Apologies, I meant my point more as amusement than winning an argument.

I am just now reading Practical Common Lisp, and amusingly one of the first things they do is build a simple query language. Didn't even use loop. So, to your point, the imperative commands of looping can be far removed from what people today call comprehensions. That said, I don't think it is inherent. Just a quirk of history.

afandian
No point scoring inferred. In fact, it's a great illustration of different language and technology communities taking words and ideas and implementing them differently, crucially, with different degrees of specificity.
dfox
Java's streams implement the same idea as Common Lisp's streams, which were supposed to be more Common Lisp-like replacement for LOOP (eg. composable and with syntax that does not involve string comparison of symbols). That idea was droped from the final ANSI standard.
lispm
SERIES, not streams.
thesz
https://www.schoolofhaskell.com/school/to-infinity-and-beyon...

See TransformListComprehension section. Haskell uses tricks from SQL while still having nice syntax.

http://ekmett.github.io/discrimination/ - a library to perform what SQL engines do for optimization of relational queries, online and in Haskell.

mnazim
A very good way to understand the fact "class are for encapsulating behaviour, not data". But we don't learn until we fall.
bladelessninja
I have strong .net background with mostly pretty legacy code and I'm amazed that this talk is so "wow" for some people. Till this point I wasn't aware people has such problems like described in this talk. Maybe I'm just getting old, but SQL was my first idea to solve problem. I think for most reporting related tasks simple SQL will do. I think some people just get caught in their bubble and forgot the basics: choosing right tool for a task. Of course SQL has its issues: e.g. it is problematic to test it and to manage db model changes (code first aproach wins there, hands down), but for data extensive tasks it is the way to go. This whole talk just remind me some of legacy code I was investigating one time, where developer just though he can implement everythin on his own better than standard library. Probably you need some distance when looking on particular problem to see more just "how I would just loop through all this data an calculate this".
Sulong
May be the topic of talk should be how to compare apples and oranges for half of time in the talk.
StreamBright
From the business point of view there is only apple.
lukaseder
Yes that's what this talk is about.

A lot of people recognise apples and oranges for what they are. The talk is meaningless to those people.

A lot of other people don't / can't make the distinction (yet). For those people, the talk is an eye opener.

RandyRanderson
I'll save ppl some time:

* first 20 mins: Writing a contrived report query that's easy in SQL is easy in SQL.

* but: most real reports require many different fields collated into many parts of say a pdf or xls. Once you take this into account writing custom code to do the entire report is often the only way to do it. :<

* you often can't do some of what the author suggests bc in a "real" org you have db zealots that only allow access to their precious through stored procs and a surprisingly large number of meetings

* trying to do some non-trivial where clauses? Not in SQL - so you have to use an external lang. I call this "Works in my presentation" syndrome.

* he acts surprised that a lang that uses a db through a SQL api can't be as fast/efficient/terse as a system that uses SQL and has direct access to the db/storage.

Stopped after that.

PS Seems like a funny guy tho - liked his style! :>

lukaseder
"Funny guy" here.

Can't comment on db zealots. It must suck when that happens.

But disagree entirely on your other parts. Haven't run into such cases yet, wrote all my reports in SQL (and something like XSLT for presentation logic). I've mainly used Oracle.

asah
This is quite possibly the best talk I've ever seen. No joke.
skullborg
I'm liking it so far as well - I enjoy SQL though
ipsocannibal
One of the biggest problems with embedding SQL directly into your application is the dreaded vendor lock in. In my experience each RDBMS vendor supports its own special flavor of SQL. Once you've baked that flavor into your code base it becomes non-trivial to change your SQL backend if necessary. From what I understand Amazon has spent years trying to untangle the rats nest of embedded Oracle SQL dependencies within its codebase so it can extract itself from Oracle vendor lock in. So before you bake that SQL string into your code think carefully about what SQL functionality you are using and ask yourself if its representation is specific to your current RDBMS vendor.
tatersolid
Pure ANSI SQL gets you 90% of what most apps need and is portable between MSSQL and Postgres. MySQL/MariaDB are far less ANSI-SQL-compliant, but usually only function calls change when porting.

Nobody sane uses oracle for new development.

If you chose to code your app to MongoDB or Dynamo or whatever you are 100% locked in unless you do a rewrite of your whole data access layer.

To top it off, in the real world you’ll still need a separate SQL DB for reporting and analytics.

jayd16
I really do think a lot of ORM use is because devs are afraid to use SQL or at least afraid of how others devs might use it. This is true whether devs want to admit it or not.

That said, SQL doesn't lend itself to easy composability and you can end up with "4000 line monsters" if you really tried to put as much business logic into the query as possible.

Is anyone working on SQL extensions, ORM, or new 4th Gen language that could actually support something like 4000 lines of SQL in a maintainable way?

spullara
Not designed for developers but sigmacomputing.com let's you write 4000 lines of SQL using a purely visual intetface. I have built queries with it and then used them in code though.
btschaegg
> [...] and you can end up with "4000 line monsters" if you really tried to put as much business logic into the query as possible.

In my experience so far, this has usually been the actual problem. If you are not actively trying to "cleanly" separate your business logic from your storage, you tend to end up maintaining a mess. People correctly identify the problems with the mess, but incorrectly attribute it to SQL instead of architectural problems. So they switch to an ORM and wonder why it doesn't improve anything a little while later.

And that's not to say it is easy to come up with a useful way to interface between storage and the business logic -- on the contrary. It's just that the naive approach (strongly coupling the two everywhere) is kind of a worst-case scenario. Also, there are certainly situations where ORMs are very useful; it's just that a messily integrated ORM isn't any better than messily integrated SQL.

The thing is: Your business logic needs to be provided with ways of triggering certain operations (along with certain guarantees). I haven't seen cases where a single operation would take 4000 lines of "interesting" SQL though (ignoring very long lists of columns :) ). So, if the SQL is hard to maintain: Is this a problem of the SQL, or is this a byproduct of unclear (or ever changing) interface requirements? Sometimes, taking a couple of steps back and considering alternative approaches can yield much better results than optimizing a "wrong" solution.

(I guess this could be considered a form of the "X Y Problem": http://xyproblem.info/).

jokoon
SQL feels opaque, and its syntax is one of a particular kind (can't remember how it's called), which makes it difficult to learn and understand how it's parsed, because it's not very well compartmentalized. If the language was better structured like other programming languages are, that'd be better, and jooq looks like something that should be standard across all languages.

Usually when I'm using any technology, being able to understand how it works is my most important priority, because it prevents me from using it improperly. I have a hard time understanding what those "algorithms" really are after all. I heard that databases engines use backtracking, but I'm not really sure that's what it is talked about here. Maybe databases don't use BT?

Using another language that parses every time you do something doesn't feel very fast. It is fast for many applications, but I don't think it is for all of them.

In the end, it's the same old combat, either choose peak performance or development delays with good enough performance.

Although I have to admit that for GIS (geographic information system), databases with embedded R-trees and other things are very much welcome, since implementing those algorithms from the ground up is way too hard.

dahfizz
I'm not sure that parsing is going to be a bottleneck when you consider that DBMS read from the disk.
thesz
Prepared statements would not have been invented if parsing and checking was not a significant issue.
jhayward
The main point of prepared statements is that expensively optimized query execution plans can be re-used, including incorporating things learned from data patterns by previous queries. I've never considered them to be a crutch on parsing.
jayrwren
reminds me of the presentations coming out of the .NET world in 2005 after they launched LINQ. Nearly identical points being made.
ilitirit
Is SQL no longer featured as part of most Comp Sci/IS courses any more?
watsocd
I am teaching a college course on web development later in the spring and teaching raw SQL will be one of the main segments.

I will discuss a few ORMs to make the students aware they are out there but we will not teach one in the course.

collyw
God help us if it's not.
Izkata
It was for mine (2006-2010), but significantly more limited than you'd expect. It was essentially bare minimum, focused on theory, and could be summarized as "SELECT .. FROM .. INNER JOIN .. WHERE .."

Most of the focus was on things like database normalization instead of practical usage. I recall upon starting my first job, a co-worker gave me a crash course in what exactly a LEFT JOIN was (though at this point it had been ~3 years since the college database course, which had been so lackluster I hadn't used one during those years).

didibus
Pretty funny talk, but also informative.
continuational
The problem with SQL is that SQL that works today will be too slow tomorrow. Fixing it involves trying to please the query planner, all while making your query less and less comprehensible.

As a contrasting example, the experience you get with ElasticSearch is that queries have predictable performance even as your data changes. In that way it's much, much nicer. On the other hand, you lose joins, which is a huge downside.

Seb-C
This is actually more a problem of using MySQL than a problem with SQL itself in my experience.
anticensor
Butcher a SQL adapter over Elastic output and call it a day.
michaelcampbell
https://github.com/NLPchina/elasticsearch-sql/

https://www.elastic.co/products/stack/elasticsearch-sql

bladelessninja
You miss the point. ElasticSearch just do some gardening for your data automaticaly, where in pure SQL you need to do it yourself, but there are tools for it.
ak39
So in 20 years we've gone from vilifying SQL statements in code, witch-hunted folks who did this and instead invented and propped up clunky ORM implementations, ignored sage advice on the famous "object-relational impedance mismatch" (which the author claims he doesn't "believe" exists - it's not a choice btw) and now, after 20 years of agent orange in the bloody Vietnam of Computer Science, we are presenting why SQL was better all along - showing off windowing functions as examples? This is irony that took 20 years to sink in.

Am I understanding this guy's presentation correctly? (I left around the 20 min mark of the 50 min presentation).

thegeomaster
Recently I tried using an ORM "properly" for a bigger application. I tried for days to fix the performance problems, which were mainly caused by serialization/deserialization overhead because of humongous generated queries where I didn't need 90% of the info.

I finally gave up, and now most of the ORM usage in the codebase is just to communicate my custom queries to the underlying database and translate to/from domain entities. I feel like this is the sweet spot―give me a thin abstraction over SQL and do the gruntwork of converting domain entities to tables and vice-versa, but nothing else, thanks.

I'm still unhappy about the performance (when everything is in the working set, query takes 2ms and returns a couple of rows but the whole API call, which doesn't do much except massage the response to some kind of JSON, takes 30ms!), but that's on me for my ORM/language/runtime choice, I suppose.

balfirevic
> Recently I tried using an ORM "properly"

Which ORM was it, if I may ask?

ak39
"I feel like this is the sweet spot―give me a thin abstraction over SQL and do the gruntwork of converting domain entities to tables and vice-versa, but nothing else, thanks."

Something like iBatis/nBatis? (Haven't used this since 2004)

lukaseder
jOOQ!
coleifer
jOOQ is legit. I found it very helpful to read jOOQ as I was writing my own library in python. There's a huge difference between a query-builder and something like hibernate or sqlalchemy.

Query-builders make it easy to compose queries, leaning on your programming language to enforce syntactic correctness. When you're dealing with SQL directly, composition means string concatenation. A query builder gives you a data-structure that represents a query.

I think it's helpful to distinguish between query-building and object mapping when the "orm-vs-sql" thing comes up.

lukaseder
What's your library?
coleifer
"peewee" - https://github.com/coleifer/peewee
lukaseder
Very cool (and how could I miss it?)
michaelcampbell
*Batis is still a thing, and is closer to what they were asking for, but you're still mapping your domain to hand-coded SQL. That's not bad, necessarily; I just don't know if his "abstraction over SQL" means you write it and your domain code uses that abstraction, or it's provided by the framework.
michaelcampbell
> Recently I tried using an ORM "properly"

No, you weren't.

> ...queries where I didn't need 90% of the info.

lugg
What language are you using?

Most languages plain sql drivers have something like a cast result row into object as you cursor over the set.

E.g. PHP has fetchObject https://www.php.net/manual/en/pdostatement.fetchobject.php

nabla9
Assumption that only private consumers consume fashion, trends and "Pepsi generations" is wrong.

Corporate management, consulting and engineering subfields are made of people who don't check out from their personal needs when they work. Hype cycles, fashions, scenes and subcultures exist in software engineering just like they exist in corporate management and consulting. People want new and change.

SQL is the best for most database needs, but not for everything. It may seem boring if there is no marketing push that connects to emotions. SQL was established and boring for new software engineers. The emergence of "Web 2.0" in early 2000s started hype cycle for no-SQL. The 'scene' was so emotionally engaging that enterprises were abandoning SQL databases because "they don't scale" and because that's what their developers wanted to work with.

ak39
Your analogy with fashion is useful, except it doesn't convey the wanton waste of money and time and brain cells this diversion took us through - we have come out poorer and it seems with significant loss of memory (intellectual and academic memory). Fashion choices are mostly innocuous ... this whole anti-SQL crusade was more akin to measles anti-vaccers who eventually forget the original controversies, somehow survive the ensuing epidemic only to one day stumble upon the very same measles vaccine they were against. They then present it to the infected leper colony as something new!
foobar_
Honestly the only counter argument I can think of is stored procedures, which can get quite hairy. I would still say that stored procedures are still a better hell than ORM.

The situation is so bad, I would actually like learn more about fashion and fabric than deal with another trend. Where is the objectivity ?

lukaseder
I like this analogy given that the relational model solved all the problems of networked/hierarchical models, which preceded it. Yet, NoSQL aimed to regress towards pre-SQL times
lugg
I'm curious about pre SQL database times. What sort of solutions were out there? What did the DBs look like before SQL was a thing?

NoSQL fixes a lot of the mismatch with code models but imo doesn't really solve much else and lacks the expressive power of SQL when it comes to reporting software.

I suspect most developers don't realise how much they are not the main user or consumer of SQL. (Business excel spreadsheets)

nabla9
Navigational databases predated SQL databases (hierarchal or network/graph based). SQL was seen as huge improvement over navigational databases.

Basically navigational database is just set of links to records. Filesystem can be treated as hierarchal database. DOM model is modern version of hierarchal model.

collyw
Key value stores, Berkely-db was a popular one as I understand.
jhayward
The main model was CODASYL [1], a hierarchical navigational database standard. There were also various vendor-supplied things like IBM IMS for simple indexed storage/retrieval.

[1] https://en.wikipedia.org/wiki/CODASYL

collyw
I have recently joined a project where the database is mongodb. One thing That I have noticed is that its far more difficult to get a mental model of the application without a formal schema. In the same way that tests can act as a from of documentation, I now see that a database schema can act as a form of documentation as well. Something I haven't heard mentioned in any of the criticisms of no-sql. (The mid term plan is to move back to SQL for this project).
codetrotter
“Show me your flowcharts, and conceal your tables, and I shall continue to be mystified; show me your tables and I won’t usually need your flowcharts: they’ll be obvious.”

—Fred Brooks, The Mythical Man Month

nabla9
Usually the data and the model are more important and live longer (several decades) than the software itself. Traditionally you design the model for the data and interactions and then build the software around it.

If you do fast prototyping this is not needed, you just need some quick and dirty persistence. Unfortunately the prototype is always the 1.0 version to ship.

lukaseder
Presenter here. Yes that's pretty much it. There are many more convincing reasons than window functions in the second part which focuses on the execution of a 4GL, not the development with it.

Just to be clear: the talk uses hyperbolic language to make a point being that SQL is perfect for data processing, not for everything

ak39
Hi Luka,

Your presentation was excellent. I'm not attacking you. I agreed with everything you said (at least the parts that I watched). My _rant_ (or attack), instead is with the _phenomenon_ we were so strongly and needlessly gripped in for two whole decades. A cult phenomenon that had all the hallmarks of religious indoctrination.

You said: "Just to be clear: the talk uses hyperbolic language to make a point being that SQL is perfect for data processing, not for everything"

I agree. SQL isn't useful for all cases. But those cases are few.

lukaseder
I know you didn't attack me, just wanted to make sure the pendulum won't swing too far to the opposite side :)
gigatexal
Give me a raw sql statement or give me death. I like the way you think.
michaelcampbell
> Just to be clear: the talk uses hyperbolic language to make a point being that SQL is perfect for data processing, not for everything.

So, there's the point.

gigatexal
I should add this is for when working with a relational database where raw SQL is the best option. For other things say like data science python and pandas would be my go to
lugg
I dunno. I think there is quite a lot of us that have consistently expressed the stupidity that is wrapping a powerful query language in a mediocre, limited, buggy, half baked, shoe horned into OO shit what was I saying?

Oh yea, ORM.

I've always veiwed it as a legitimate (though always over engineered) solution to wrap some repetitive database operations in a consistent manner.

The biggest issue with ORM isn't the solution they came up with, half baked and buggy as they are, but the problem they are actually employeed to "solve": developers afraid of SQL. Whether they can't write it, can't learn it, or just straight hate the string representation in their code, it's the main reason people sign on for ORM.

Then they prop it up with all manner of baseless reasoning.

Theb there is the leakyness. ORMs are usually quite fluid in their language, they make it easy to represent the database in code. The only issue is that it isn't there. It's a separate thing. When you see database code it should be obvious and very far away from business logic. What does ORM do? Mix it all up On purpose. Have fun with all the mocks.

sfilipov
I don't disagree overall but I'm using SQLAlchemy (Python) on a daily basis and it works quite well.

I agree that if you are building ORM with the intention of hiding so many SQL details that the developers don't need to learn SQL, you'll run into problems.

SQLAlchemy on the other hand starts simple, but is flexible enough that you can do inner/outer joins, you have different loading strategies etc. It is quite nice if you are proficient enough in SQL.

When writing complicated select queries, I'd start by writing the raw SQL then spend my time representing it in SQLAlchemy. It's a bit less than double the work sometimes, but working with the SQLAlchemy objects in Python is quite nice as long as you pull them in an efficient manner. SQLAlchemy just feels like a well thought out ORM.

lugg
Never heard of SQLAlchemy. The core part of the lib looks quite nice, the problem is its design is driven by the needs of its ORM.

If you define your database properly typed, defaulted, and not null unless an update field that sort of layer isn't needed at all.

Using the correct native types on your function params or class structures means you'll never run into type mismatch or cast failures loading or unloading from the db.

So I just don't get why the need for ORM. The only thing that needs mapping is the table schema to your data type object/structs.

ORMs don't offer anything of value their and often make things you need to do in your job harder.

leejo
I'm sick of the argument that those who advocate for the use of an ORM have a dislike of SQL, or simply don't understand it. Perhaps this is true of newer developers who have "grown up" on ORMs, but for the most part advocating for an ORM is simply wanting to employ the "don't repeat yourself" thing and having enough battle scars to know that trying to compose your own SQL statements results in vulnerabilities, bugs, and a mess.

> If you define your database properly typed, defaulted, and not null unless an update field that sort of layer isn't needed at all.

Which is a laudable goal but not realistic if you've worked on any backend that has even a hint of organic growth or legacy. I know of at least two books on the subject of dealing with anti patterns in and the refactoring of database schemas.

Database schemas are the mortgage of technical debt, they can take literally decades to pay off if you don't get them "right" in the first place, or you pivot, or you're growing quickly, or you have a large team. Using an ORM allows you to abstract away and isolate a lot of the technical debt, create single entry points, and then fix those.

And in the part where using the ORM's syntax would obfuscate or complicate compared to SQL, then just write your own SQL query anyway and put this in a virtual view.

Sure, you can create your own entry points that handle the madness and domain knowledge, with your own SQL handling to put the pieces together that return a nicely mapped object or data structure. Then what have you done? You've written your own ORM.

lugg
Vulnerabilities from writing SQL? ORMs do not solve this. Prepared statements and parameterised queries solve this.

Bugs? You're shitting me, from what? Untested queries?

A mess? Clean it up. You can make a mess with an ORM too.

> Which is a laudable goal but not realistic if you've worked on any backend that has even a hint of organic growth or legacy.

Except it is realistic and I have cleaned up a lot of "organic growth" and "legacy code" like this.

> I know of at least two books on the subject of dealing with anti patterns in and the refactoring of database schemas.

Yea.. where do you think I got these ideas?

> Database schemas are the mortgage of technical debt, they can take literally decades to pay off if you don't get them "right" in the first place, or you pivot, or you're growing quickly, or you have a large team.

Database design is not hard and your schemas/changes should be peer reviewed like any code. It's also not static, yes, some, SOME, very few situations are essentially unchangeable.

But most database design is easily extended without causing breaking change.

Breaking change is more problematic but not in any way more or less than any other breaking change.

Then again, I'm kind of curious how any of these problems are solved by an ORM.

leejo
> Database design is not hard ... I'm kind of curious how any of these problems are solved by an ORM

Database design is indeed not hard (for the vast majority of use cases), but we can't predict the future. I'll give you an example from our app.

We've recently had to update the schema to support multiple different tax amounts for transactions. This is several different tables in the old schema, which all now have an extra child table each, and a lookup table to store the tax rates, to write/read those tax amounts for the transactions (basically a 1 .. n table for each parent table). Pretty basic stuff.

Anyway, historically, i.e. when the app was written 15 years ago, it was in a single market with a single tax rate on transactions. The original developers didn't foresee that the platform would turn into a multi-billion turnover level system, that would expand into different countries which have their own complex tax rules. It made sense that you have an amount column and a tax_amount column. Now it doesn't make sense, because hell no i want this done right and we're not going to add more columns we're actually going to model and normalise this correctly and move the tax amount(s) into child tables.

Because the older code had all of the interaction with the database not so abstracted, poorly factored if you will, we had to update dozens (i forget how many) of queries to add all the new joins and of course it needs to be back/forwards compatible, and we have to roll this out piece by piece because (as i said above) this is a multi-billion turnover level app.

The existing queries are in the region of 50+ lines of SQL with a dozen or so joins in some case. That's not actually complex SQL, it's pretty mundane and trivial SQL - SELECT a few columns, JOINing a few other tables, WHERE stuff matches conditions. There's the occasional IF, CASE, COALESCE, statement in these queries as well, and the odd UNION here and there. Again, pretty simple stuff. There are a few nice surprises from it being an old schema, polymorphic relations for example.

The changes amounted to hundreds of lines of SQL additions, the equivalent in actual "getting stuff done" code additions, making sure this is all tested so there are no regressions in moving from the old tax column in the parent table to getting the tax amounts out of the new tables, and weeks of testing and logging to find any edge cases or bugs.

In the newer model code that wraps all this stuff in an ORM? A few lines of code, because we can just compose a role into the model classes giving them access to a tax_amounts method that returns the details of each transaction's multiple tax amounts + rates + other stuff. The ORM knows about the new tables, the queries run by it add in all the necessary joins for us. The model classes are correctly factored to compose the Taxes role.

Now you can argue that if the original code had been better factored then we wouldn't have been in the same situation, and it would have been easier, but we would still have had to update multiple queries. This is where the ORM shines and it's why I'm tired of writing trivial SQL having done so for twenty years against half a dozen different database engines. I really don't want to have to write SELECT a few columns, JOINing a few other tables, WHERE stuff matches conditions type queries again. Unless of course I'm sat in the database terminal client, which happens more times a day than I can COUNT.

lugg
Not really sure how ORM solves anything here.

That said, your old and new solutions both sound wrong to me.

Tax is a line item.

To me it's as much a line item as a product, service, shipping, handling, booking fee, or any other thing that gets summed to form the total order / transaction value.

You then get to ref optionally in the way that makes sense to map attribution correctly.

Tax belongs to an item? No problem. Two taxes, one item, no problem! One tax multiple items? No problem!

leejo
> Not really sure how ORM solves anything here. That said, your old and new solutions both sound wrong to me.

The new solution does what you talk about, I guess I didn't explain it very well.

The use of an ORM is abstracting away a lot of the technical debt in the schema and isolating it so we can concentrate on fixing that debt without the worry that it continues to spread. It's much looser coupling than continuing to litter the codebase with manually written SQL.

Edit: reading more of the replies to your original comment - there is btschaegg's comment that having a narrow scope is what you should aim for[1]. There's also a reply (first one) on stackoverflow that talks about not leaking the abstraction[2]. Finally the key to our use of an ORM is that the ORM is not our model, we do not leak the abstraction, and we can drop down to manual SQL when we need it but that is still in a narrow scope.

[1] https://news.ycombinator.com/item?id=19592042

[2] https://softwareengineering.stackexchange.com/questions/3045...

lugg
Thanks for pointing me at btschaegg's comment.

> The use of an ORM is abstracting away a lot of the technical debt in the schema and isolating it so we can concentrate on fixing that debt without the worry that it continues to spread.

Is there a good reason you're abstracting over technical debt instead of just paying it down?

I mean I get it, its convenient, but now you have at least two sources of truth. One for your database, one for your application, and then god knows how many for the rest of the company.

If you fix your underlying schema things will break. Good. You might see it.

> It's much looser coupling than continuing to litter the codebase with manually written SQL.

How is SQL litter? Its as much litter as fopen.

How is it much looser coupled? To what? The data? Why would you even want that?

Loose coupling is only a benefit when the cost of breaking change is significantly higher than the cost of an extra layer of abstraction and indirection.

My core problem with ORM's is that I'm not sold it's a net positive ROI. It has benefits, sure, but it comes at a steep price I'm not prepared to pay.

> [2] https://softwareengineering.stackexchange.com/questions/3045...

Kind of agree with a lot of it but it breaks down pretty rapidly.

> Instead, with a properly designed database and an OLTP use case, ORMs are golden. Most of the grunt work goes away and with tools such as DBIx::Class::Schema::Loader, I can go from a good database schema to working Perl code in minutes.

I don't actually care about grunt work, writing code is like 5% of my time spent. I also question that its even a true statement. ORM's tend to lead to lots of very customized queries that are never abstracted. It also leads to a lot of duplicate code that is hard to detect.

> In reality, what happens is that people leak their ORM code all over their controllers (and views) and when they hit scalability issues, they start blaming the ORM rather than their architecture. The ORM gets a bad rap (I see this repeatedly for many clients). Instead, hide that abstraction so that when you've genuinely hit ORM limits, you can choose appropriate solutions for your problem rather than let code be so tightly coupled to the ORM that you're hog-tied.

So the suggestion is to not only use an ORM to abstract the DB+SQL away from your application but to also then abstract your ORM away from your code. What's next? Abstracting your AbstractORM into an AbstractAbstractORM? Can you see the problem here? If I have to abstract the DAL from my code base (and I do) why not just do it once?

...

> As Rob Kinyon made clear above, reporting tends to be a weakness in ORMs. This is a subset of a larger problem where complicated SQL or SQL which spans multiple tables sometimes doesn't work well with ORMs. For example, sometimes the ORM forces a join type I don't want and I can't tell how to fix that. Or maybe I want to use an index hint in MySQL, but it's not easy. Or sometimes the SQL is just so darned complicated that it would be nicer to write the SQL rather than the abstraction provided.

So if I blame ORM's its my architectures fault, but when you do it its just "other limitations". I think if the underlying SQL is easier to write than the abstraction code then the abstraction failed, or to my original point is just straight up half baked and incomplete.

(lots of "yous" I know you didn't write this, assume I'm being pretty general, none of this is personal)

leejo
> If you fix your underlying schema things will break. Good. You might see it.

I wish it were that easy, I really do.

Demiurge
Why is it such a 'one vs other' argument? This is like arguing against levels of APIs, a completely moot argument without proper context for usage.'Wrapping' doesn't prohibit you to go 'lower' if you want.
lugg
Why is it one vs other? It's not, you can run them both. The question is why you would. What does an ORM solve for you?

This is the only mapping I need from an ORM:

    result = (object) row

    row = (array) result
SQL does everything an ORM does better.

I'm not sure how you can call moving from a low level programming language to a domain specific language like SQL "going lower". It's not and ORMs are usually so bad they're a regression in what you can do.

Also, when was the last time a change to the SQL spec broke your app?what about the last times you had to migrate ORM code?

ken
> Also, when was the last time a change to the SQL spec broke your app?what about the last times you had to migrate ORM code?

That's kind of an arbitrary categorization. An ORM is just a library. If you asked me when the last time I had to update my code because a library changed interfaces, I'd have an answer for you (less than a month ago, I'm sure). I'm not going to stop using libraries, though.

I've had to update my programs several times because my compiler was updated, too. That may make me choosier in the future about which compiler I pick, but I'm not going to stop using compilers altogether.

HelloNurse
The arguments is against a specific level of API that is particularly bad. Expecting simple means to obtain objects out of tables is unreasonable, and attempts to that purpose must cheat one way or the other; in the opposite direction pretending that a database can offer "persistence" for objects is naive and highly constraining.
cmiles74
I think things have very much moved away from the "one vs. the other" stance. Back in the late 90's, I think people really did think that they could spend time on an ORM tool and never have to think about the database.

In my own work, the ORM is there for lots of boring stuff, like fetch the data and push it into an edit form and then push the contents of the edit form back into the database. When it comes to q query of any consequence I end up writing SQL and then presenting a data set to the ORM, something that it can understand.

Maybe it's my age, but I'm still not a big stored procedure guy. But you look at some of the interesting things you can do with SQL, it seems silly to try and interact with it only through someone else's ORM tool.

rdiddly
I laughed at this because it's so true... yet I still use ORMs gleefully so I can remain in my pathetic little comfort zone when the queries get especially complex. (Although ironically, my particular ORM of choice tends to fall apart when queries get complex, and you have to do clumsy iterations etc. Also after viewing this presentation I start to wonder how many of the built-in optimizations are being countermanded/ruined by the ORM.)
wswope
Could you share some good practices for separating db code and business logic? I feel like when I don't use an ORM, there winds up being inline SQL all over the codebase.
_bohm
I'm a big fan of the pattern introduced by Yesql for Clojure [0], also ported to Python as anosql [1]

[0] https://github.com/krisajenkins/yesql [1] https://github.com/honza/anosql

WhatIsDukkha
I call this pattern "dotsql". (I've made private golang and rust crates for it).

You avoid inline sql like the plague and your editor can have a nice dedicated sql buffer with all the bells and whistles.

I wish it was a more common pattern, it just makes soooo much more sense.

wswope
Wow, I wish I'd learned about anosql a long time ago! I love the elegance of it - thank you for the rec!
btschaegg
I don't think there's a one-size-fits-all solution to this, since different domains require different guarantees from your storage medium. For example, if you need to combine multiple actions with a certain level of transaction safety, defining an interface for that can get tricky.

Overall, I tend to agree with Uncle Bob's "Clean Architecture" [1][2] approach when it comes to DBs, though: The DB usually (rembember that there is an exception to every rule!) has no business of being the "center" of an application. It should provide you with functionality, not dictate how your business logic works.

If you can pull out your DB and give it a sensible interface, you still might end up having a lot of inline SQL queries, but they are not littered all over your codebase; they live inside a narrow scope (a plugin, say). Also, the SQL schema and queries in this case often are quite boring -- the tricky bit is coming up with a good interface that provides all the guarantees you need to uphold (complicated transactions maybe?). And remember that you need these guarantees because of how your business domain works, not because of your DB. Perhaps you can get rid of some of them if you organize your storage (i.e. DB) in a specific way?

[1]: http://blog.cleancoder.com/uncle-bob/2012/08/13/the-clean-ar... [2]: https://youtu.be/o_TH-Y78tt4?t=2565

Edit: Also note that there's no reason not to use a ORM in this way, too (if it suits you). The important part is the same: The ORM has no business dictating how your business logic should be implemented, and if you try to remove it from your core application, you will end up doing the same things. Now, if your ORM makes the implementation easier for you, go ahead and use it. But don't expect things to change if you put ORM-generated code instead of inline SQL at the center of your application.

ken
> the problem they are actually employeed to "solve": developers afraid of SQL. Whether they can't write it, can't learn it, or just straight hate the string representation in their code, it's the main reason people sign on for ORM.

This sounds awfully like a straw man. I've worked on several projects that used an ORM, and it was never because developers couldn't write SQL.

ORMs solve a real problem: SQL isn't composable. That's it.

Sooner or later, every non-trivial program needs to compose arbitrary pieces of one query with arbitrary pieces of another, so you'll either use an ORM (i.e., native data structures instead of raw strings), or essentially write your own ad-hoc one.

It has nothing to do with lack of ability. The staunchest ORM advocate I've worked with was also the first to pull out the debugger and single-step the MySQL driver to track down a bug (that one was nasty -- it was indeed in the driver).

I use an HTML generator, and a compiler, too, and it's not because I'm "afraid" of assembly language or that I "can't learn" HTML. These abstractions have real value.

karmakaze
> ORMs solve a real problem: SQL isn't composable. That's it.

How so? I've always found the opposite. I can build up a compilex query referencing many tables in SQL without impediment but am unable to do the same with an ORM.

platz
OP means combining two different queries or clauses into a bigger query/clause

You cannot easily "modify" a sql statement.

There are type-safe DSLs in certain languages (if they have an expressive enough type system) that let you do this, but they are uncommon

1stranger
They're not that uncommon. You can do this with SQLAlchemy in Python or jOOQ in Java.
lugg
Nested queries? Joins? Stored procedures? Views? This is literally everything SQL was made for.

Modify a query? Literally just code.

Like what do you people think SQL is? Select * from table?

platz
We are talking about programmatically constructing a query at runtime by stictching together bits of sql, in a way that which bits of sql you use are actually dependent on the runtime state, and in a way that is type-safe.

You could do something similar with either dynamic queries in a stored procedure, but without a DSL those dynamic queries are brittle, and the runtime composition using them is hard and error prone.

karmakaze
The metaprogrammability using an ORM has never been wroth the loss of expressiveness in my experience. If I needed it, I'd probably not be using the procedural language the the ORM uses.
lugg
Right, how often do you actually need to do this?

In my experience query building to this degree is usually a sign of over optimising for code reuse where it isn't really needed.

The times it does make sense, it's usually a search problem that is better solved by something like elastic search.

That isn't always feasible so what you end up with is a naive approach and just build the damn string. Or if it's needed you decompose it into the parts that are needed elsewhere table_headers() is usually all you need to stay DRY.

It's not as pretty as an ORM, it shouldn't be used for every little query "because you might need it in the future". But the cases it is needed are rare, so does it really matter?

Like you've got to adjust your frame of reference. Your database schema is static, it's not going to completely change out from under you at runtime, it's just a non problem.

flukus
>Right, how often do you actually need to do this? In my experience query building to this degree is usually a sign of over optimising for code reuse where it isn't really needed.

I need to very often and it's rarely about code reuse. You requirements like "on the client list screen we want to be able to search by client id or client name and we want to be able to sort by last login time or name"? What about writing an API, that's 90% handling things like this.

And then there's the other cases ORM's solve, like representing data as something more than a 2D table.

lugg
That problem is a query building problem you probably want to look into graphql.

ORMs sometimes come with query builders that help things but in my experience it's not worth having to clusterbomb my codebase for it.

platz
Implementing graphql, which you can only apply in a limited number of client-server scenarios, to solve a relatively trivial problem, sounds like the clusterbomb or a good 2,000 lb mk-84 general purpose bomb to me e.g. graphql is not a panacea
None
None
karmakaze
Thanks for the explanation. I've never found adding to an SQL query to be any effort. The only place I'd put any value into a DSL is that it doesn't miss including all the conditions for a join.
vp8989
You could use views. Although personally I dislike adding this layer of obfuscation. Generally devs are very underskilled in SQL (relative to it's importance) and ORMs hiding the SQL IMO is one of the reasons why.

There is great value in having the entire SQL easily visible in your source code. If there is a performance issue with a specific query my app is running, I can very quickly pull it out and into a database IDE and see what's going on using the more detailed tools available (REPL, execution plan/statistics usage analysis etc).

If the query is composed from a bunch of ORM function calls, I have to step through the program to first generate it.

Reducing friction around debugging allows your devs to become more capable in fixing things or making them work better, and SQL is no exception to that.

ako
Views are not a layer of obfuscation. They are a layer of abstraction...
jayd16
>it was never because developers couldn't write SQL.

>ORMs solve a real problem: SQL isn't composable. That's it.

But really though, these are at odds. The question is never "how can we compose this SQL." ORMs are always about writing less SQL.

nojvek
ORMs are about abstractions. We use django, and django gives us great abstractions so we can create more value customers rather than do sql kungfu.

When we find something can be better expressed as SQL because of a complicated join from many tables, we use SQL.

ORM is just another tool in your toolbox. They are not mutually exclusive.

jayd16
Mixing paradigms can often cause issues with ORM caching and transacting so in a sense they are often exclusive, although you can usually just throw more code around to make it work.
lugg
That's not a straw man and I specifically said ORMs had their design reasons? My complaint is that they got popularised because of their perceived ease of use because of a perceived difficulty with SQL.

Anyway..

> ORMs solve a real problem: SQL isn't composable. That's it.

It's a DSL. It's literally the most composeble thing you could build as a solution. Hell it's so composable people decided they could build ORMs on top of it.

SQL was created to solve the problem of interfacing, composing and reasoning about tabular data.

ORMs were built because?

I feel like your thought that SQL isn't composable is due to a reluctance to create a function that executes a new query?

Or are you really looking for a way to compose queries over objects? Because SQL has that too..

HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.