Hacker News Comments on
Joe Celko's Trees and Hierarchies in SQL for Smarties (The Morgan Kaufmann Series in Data Management Systems)
·
4
HN comments
- This course is unranked · view top recommended courses
Hacker News Stories and Comments
All the comments and stories posted to Hacker News that reference this book.If this topic interests people, take a look at Joe Celko's book called Trees and Hierarchies: https://www.amazon.com/Hierarchies-Smarties-Kaufmann-Managem...
⬐ WalterGRAvailable on what was formerly O'Reilly Safari: https://learning.oreilly.com/library/view/joe-celkos-trees/9...
Why not add that functionality directly to SQLite via stored procs*https://www.amazon.com/Hierarchies-Smarties-Kaufmann-Managem...
⬐ kevasCan’t edit on my phone and meant to toss this in the comment too: https://www.amazon.com/Hierarchies-Smarties-Kaufmann-Managem...⬐ pstuartA more recent effort: https://cgsql.dev/⬐ kevasThanks⬐ claytongulickI just read through the docs of this, what an amazing project.I'm considering doing a js template string implementation for node.. cql`...` type thing with an internal compilation cache.
I don't know much about this but Joe Celko has a whole book on representing trees in SQL: http://www.amazon.co.uk/Hierarchies-Smarties-Kaufmann-Manage...The obvious first question would be: why are you representing your data as a graph? Can you represent it better as sets of predicates (ie: relations)?
Using views should not reduce the number of queries: a view is just a query with a name. If you can do it by combining views you can do it by combining queries.
A good book on the subject is Joe Celko's Trees and Hierarchies in SQL for Smarties.http://www.amazon.com/Hierarchies-Smarties-Edition-Kaufmann-...
He spends a chapter on each of the models outlined in this post: adjacency, path, and nested set models.
⬐ platzI've done nested set before; it's interesting - very fast for queries, but requires a lengthy insert/update cost. Also, team members were absolutely clueless as to what was really going on. I'm not sure nested sets are really much faster than what modern rdbms's can provide today.⬐ rickmode⬐ mceachenOne just needs to be aware that nested sets eventually hit a limit.I experimented with a variation by Vadim Tropashko using something called "Farey fractions" [1]. These represent the intervals as a 2x2 matrix of four integers rather than two floating point values.
The numbers are effectively limited to 32-bit values in the matrix since a multiplication is required (resulting in 64-bit intermediate results).
It was very interesting, but couldn't model a file system hierarchy well. It could roughly 10^32 items in the best case, but a very small number (hundreds) in edge cases.
For example, it maxes out at a depth of 17 with 100 items at each level, or a depth of 34 with 10 items each. This might be fine for modeling some hierarchies, but definitely not a file system. The edge case is if the fraction extends along one edge. So if we have 10 items per level, and add a child at the leftmost edge each time, we create the most costly fractional subdivision. Do this 35 times and you hit an math overflow.
[1] Check out the Chapter 5 and Errata links: http://vadimtropashko.wordpress.com/“sql-design-patterns”-bo...
⬐ tempVariableIn addition to expensive insert/update, it is necessary to keep the left and right boundaries across ALL of your entries in perfect order. If the boundaries get out of whack, fixing the tree is a nightmare scenario.I worked on a multi-tenant application with distinct trees present in one table and with one tree per table and so on. Fun fun fun!
⬐ alistairbayleyI think the nested intervals model, a refinement of nested sets, solves this slow update problem: http://www.rampant-books.com/art_vadim_nested_sets_sql_trees...But I've never had to use it, so I am just guessing.
Same article, different site: https://communities.bmc.com/docs/DOC-9902
And a paper: http://www.sigmod.org/publications/sigmod-record/0506/p47-ar...
Here's a comparison of the different approaches in a matrix: http://vadimtropashko.wordpress.com/2008/08/09/one-more-nest...
If you're storing a tree in an RDBMS, please look into the closure table algorithm rather than adjacency, nested set, or materialized paths.If you're using Rails (3.2 through 4.1), try this: http://mceachen.github.io/closure_tree/
Like it says in the README:
* Fetch your whole ancestor lineage in 1 SELECT. * Grab all your descendants in 1 SELECT. * Get all your siblings in 1 SELECT. * Fetch all descendants as a nested hash in 1 SELECT. * Find a node by ancestry path in 1 SELECT. * 2 SQL INSERTs on node creation * 3 SQL INSERT/UPDATEs on node reparenting
None of the other approaches above have even remotely similar performance characteristics. If your tree is small (tens of nodes), you won't care. If it's bigger, you will.
⬐ ToddI wasn't aware of closure trees before, thanks. The presentation that you link to by Bill Karwin, along with a few other resources are in the comments by coleifer and chdir.