Hacker News Comments on
The NOSQL Store that Everyone Ignored
Postgres Conference
·
Youtube
·
97
HN points
·
0
HN comments
- This course is unranked · view top recommended courses
Hacker News Stories and Comments
All the comments and stories posted to Hacker News that reference this video.⬐ bob1029I noticed a lot of people noting that they enjoy use some hybrid of JSON w/ their SQL. I do the same thing and I think this is an incredibly productive way to manage fast-changing things.One trick I learned along the way is to apply gzip to your JSON columns. For us, we got over 70% reduction in column size and query performance actually increased. I presume because it is faster to read fewer IO blocks than more, esp when doing table scans (which you shouldnt but they happen).
The only caveat with this is if you want to query against something in the JSON, but I think this is where the balancing act comes into play regarding what is in (compressed) JSON and what is in a column.
⬐ NicoJuicy⬐ darkhorse13In. Net you can use Marten DB.You can signal query able properties who are then duplicated as normal db columns. It works quite nice actually.
⬐ fabian2kIn Postgres you would generally use JSONB columns, those are stored as TOASTs and are automatically compressed. That compression isn't very good, but there is some work on adding other compression algorithms there.⬐ tpetryPostgreSQL 14 will add lz4 compression for toast data. But if i remember correctly the first ~700 bytes are always stored inline and only the rest is compressed. But i could be wrong on this.The performance of JSONB is also surprisingly great. I played around with around 1M rows of data for a project using JSONB, and compared to traditional SQL, the complex queries were 500-800ms slower, which was totally reasonable for the use case.⬐ eyan⬐ sgtGreat data point! Thank you!Been doing this for years now and it has proven itself to he a great solution. We use a hybrid approach: traditional columns in addition plus an entity/content type JSONB column.⬐ mateuszf⬐ the_gipsySame here. Have only good experiences with this setup.⬐ hodgesrmJSON blob + columns is the recommended approach for handling semi-structured data in ClickHouse. It's easy to add columns on the fly, since you just give the expression to haul out the data using a DEFAULT clause. ClickHouse applies it automatically on older blocks without rewriting them. For new blocks it materializes the data as rows are inserted.A collegue put some data into a JSON columns and I absolutely loathed it. Can’t query it well and he neither did type it in the models in the code.However later I found that there two kinds of JSON: one “bad” that we hd to patch up with lots of validation to some spec that emerged, but another one that was absolutely right to be free form, because it was never accessed by us. The user simply inputs it freely, and reads it out somewhere else, with freedom to evolve it as desired.
⬐ mlk⬐ jf22Clob⬐ the_gipsy⬐ bonziniAha, I only knew them as blobs or JSONYeah, I have a similar experience from a project that had moved from MongoDB to Postgres. In the conversion they left a free form JSON field with information filled in by plugins, which was messy because it was impossible to query efficiently. Plugins also had some code to format the free form data in HTML, which was suffering from code duplication but provided some ideas to build a common schema for a new table and for the corresponding REST API endpoints. That worked great but we still left a free form JSON field in the new table, and also kept the JSON field in the original table to store the configuration for the plugin.In our case the "user" that just threw JSON data in and read it out was the plugins. You just have to strike the right balance between core parts of the schema, expressed as tables and relations, and auxiliary data that can use JSON.
There is a .NET project that uses this feature to create a NoSQL db experience. I've been using it for a few years and it's been great.⬐ NicoJuicy⬐ nomilkCame here to say that too.Marten DB is really good!
As someone who does indie ruby stuff, often under time constraints, I find jsonb great for storing messy data that looks unnecessary, but which might be useful down the track. jsonb is just another type as far as postgres/rails is concerned, so it's fast implement (just like any other rails migration), fast to query, and easy to access using key/value (similar to the ruby hash).⬐ jrochkind1⬐ stevesimmonsIf you are using Rails and jsonb, you may be interested in https://github.com/jrochkind/attr_json (by myself), or https://github.com/madeintandem/jsonb_accessor (an alternative).⬐ bullfightonmars⬐ paulryanrogersWe wrote a similar library that hooks into ActiveRecord. https://github.com/data-axle/superstore⬐ jrochkind1Interesting, thanks, I hadn't seen this one before! So many of us inventing the same wheel! The docs on yours are kind of limited, I don't really understand how it maps to your postgres tables, what schema it's expecting. But I plan to find time to check it out just to compare and contrast with mine! Thanks!⬐ bullfightonmarsYeah. We built this for our own use and haven’t really documented it extensively. We use it in conjunction with an elastic search ORM.⬐ jrochkind1Again, interesting, thanks!I hadn't seen that ElasticSearch ORM either! The more popular one of course is: https://github.com/elastic/elasticsearch-rails
It can be hard to know what open source is out there!
For speculative storage like that my preference is for files, ideally on something like S3. It's harder to analyze but IME the need is infrequent. Storage is usually significantly cheaper than a DB.Does anyone know of performance (and usability) comparisons between say Postgres JSONB and Azure CosmosDB?⬐ fisf⬐ debug-desperadoWe use azure SQL (basically mssql) to perform something similar -- this might be a more apt comparison.We deal with largish amounts of JSON event data, and unpack specific fields to computed columns. This allows to use clustered indices, standard analytical tools, etc., while leaving the raw data as it is.
Performance is mostly fine for analytics (time series aggregation and such). It is also a very flexible approach.
⬐ jf22Those are two completely different tools with different use cases.⬐ exceptione⬐ stevesimmonsWell, tell that to the MS-shops... I think it is a very good question, would love to see some benchmarks.To expand on my particular use case for Azure CosmosDB, I have a Python web app built with FastAPI. I find it super convenient to dump Pydantic data models of our primary domain objects to JSON and save them into Cosmos, tagged with the model name.Then on rereading the Cosmos records, my model registry automatically imports the required Pydantic model and reconstitutes the live Python object.
Had a great win this last week with JSONB. Basically I needed an aggregated materialized view for performance but still have access to the non-aggregated row data. Think aggregate by account ID but the query may access any given month of data for that account. It’s cheating a bit to use a non-relational structure, but the correlated subqueries I had to use before were 20x slower.⬐ lysecretI don't know why anyone would use say mongodb when postgresjson exists.⬐ Izkata⬐ paulryanrogersA terrifying example from work:"Hey cool, mongodb just accepts whatever you give it! We don't need to worry about writing database migrations!"
...Luckily we convinced them this was a bad idea and ended up using postgres with a handful of json columns instead. Up until that point their only experience had been with mysql so they had no idea json columns existed.
Does PostgreSQL have standard operators yet? Last I checked they only had the Pg specific ones, which predate the SQL standard. (Same for SQL/PSM sadly.)⬐ mdaniel⬐ fabian2kDo you have a link to the standard operators you mean? I know they support [SQL/JSON path function](https://www.postgresql.org/docs/12/functions-json.html#FUNCT...) but as for the operators, I only know of their own which predate the standard⬐ simonwPostgreSQL 14 is currently in beta and adds JSON subscripting: https://www.postgresql.org/docs/14/datatype-json.html#JSONB-...I find this approach incredibly useful if any parts of your data don't have a uniform schema.The one thing you need to look out for is developers putting stuff into JSON that belongs into proper columns. The Postgres JSON support is very powerful, but plain old relational queries are faster and often much easier to write.
⬐ tdaI have created an app with such a hybrid approach as the part if the model was very dynamic. The idea was that I would slowly migrate the stable bits to proper columns, but in practice that never happened. Partly also because it is not really broken, and the performance is actually quite good. You can add partial indexes etc to improve where needed.But writing analytycal queries is definitely more complicated so the cost of not having a "normal" data model adds up over time. To allow business analysts to work with the data I endend up creating a a bunch of views om the data without JSONB fields as a quick fix. So though the approach worked, I am not sure I would do it again.
⬐ kroolik⬐ jrochkind1What you did, in fact, was creating read models specific to use case. This is completely normal and allows one to optimize for different users: app's schema can be different than the one for analytics.⬐ fabian2kI'm using the JSON mostly for parts that really have no fixed schema, e.g. fields that vary by customer. The alternative would be something like EAV, and JSON columns are far superior here.Performance is perfectly reasonable with JSON, but there are many more ways to make it slow if you're not careful. Postgres has to read the entire JSON content even if you only access a small part of it, that alone can kill performance if your JSON blobs are large. The lack of true statistics can also be an issue with JSON columns. They're still easily fast enough for many purposes, they just have a few more caveats than plain columns you should be aware of.
There's some other aspects that are not that intuitive, but also not a big deal once you know them. For example you can use a jsonb_path_ops GIN index to speed up queries for arbitrary keys and values inside your JSONB column, but you need to use the @> operator so that the index can be used. If you write queries like WHERE myjson->>'foo' = 'bar' it can't use the index.
How would you describe what "belongs" in a proper column? Not just anything that doesn't have a uniform schema?⬐ bob1029⬐ megousFor us, a "proper" column is usually something we want to index against. Things like timestamps, user ids, etc. Anything that would be useful for top-level management of whatever has the json tacked onto it.⬐ conradevYou could pull those out and index them using generated columns, though.That can make things a lot easier on the application side, where the code just needs to insert the JSON blob
⬐ jrochkind1You don't even have to "pull those out and index them using generated columns" to index with postgres jsonb, although it's a bit more confusing to figure out the proper index and make sure your queries hit it, jsonb data is indexable on it's own.Also for leftovers when scraping website data. Trendy SPAs usually have JSON endpoints, so I usually use some ID column, maybe timestamp column (to ease incremental syncing later on), some primary content columns, and the whole scraped JSON object is put into a `data JSONB` column in case some of it is needed in the future. :) Works amazingly.