HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Lucas Cavalcanti & Edward Wible - Exploring four hidden superpowers of Datomic

ClojureTV · Youtube · 134 HN points · 6 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention ClojureTV's video "Lucas Cavalcanti & Edward Wible - Exploring four hidden superpowers of Datomic".
Youtube Summary
This session will explore four common problems, and the unique and surprising tools Datomic provides to solve them elegantly:

HTTP caching - How to generically generate and validate Last-Modified and If-Modified-Since headers
Audit trail - how to extend Datomic’s immutable transaction log to include arbitrary audit related metadata
Mobile database sync - trivial implementation of an incremental update API for high latency/low bandwidth clients
Authorization - easily determine resource ownership, and centrally isolate users from data they are not allowed to see

These problems have certainly been solved before using other databases, but Datomic provides features that make the proposed implementations concise, generic, and purely functional.

Lucas Cavalcanti is the Lead Software Engineer of Nubank, an early stage Brazilian Internet bank built as a service oriented architecture leveraging Clojure and Datomic. Lucas is a functional programming enthusiast, and proponent of best practices in software development, with a vast experience in real production applications written in Java, Scala, Ruby and now Clojure. He holds a BS in Computer Science from the University of Sao Paulo.

`Edward Wible is the CTO of Nubank, an early stage Brazilian Internet bank built as a service oriented architecture leveraging Clojure and Datomic. Prior to co-founding Nubank, Edward worked in technology-focused private equity (Francisco Partners) and management consulting (The Boston Consulting Group). He holds an AB in Computer Science from Princeton University and an MBA from INSEAD.
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
1. High performance read/write of Scylla/Cassandra with high availability[1]. It has some limitations for OLTP workloads and require careful planning. Postgres without Citus is not really HA and even then companies would often invent custom sharding/cluster management system.

2. Powerful graph query language like Cypher. It might not perform well in real life, but my personal experience left me amazed[2]. There is a number of issues with SQL that could be addressed[3], but current standard is way too much prevalent.

3. Zero impedance mismatch between database and application representation. In database like Smalltalk GemStone it is really seamless experience for developer to write database code and application code[4]. To some extent, MongoDB success can be attributed to this aspect.

4. Datomic temporal capabilities[5]. It is hard to maintain temporal tables in postgres. There are some use cases where you really want query in point of time. Strictly not an OLTP feature, but I can see this be usefully in many scenarios.

  [1] https://www.youtube.com/watch?v=Fo1dPRqbF-Q
  [2] https://www.youtube.com/watch?v=pMjwgKqMzi8&t=726s
  [3] https://www.edgedb.com/blog/we-can-do-better-than-sql
  [4] https://www.youtube.com/watch?v=EyBkLbNlzbM
  [5] https://www.youtube.com/watch?v=7lm3K8zVOdY
Edited: independence -> impedance
zozbot234
Postgres comes with the building blocks for both sharding and HA out of the box, and they're extensively discussed in the docs. You don't need proprietary addons other than as pure convenience.
jjirsa
Sharding is not the same as natural clustering, because eventually you’ll need to reshard and then you’ll be writing a lot more code.
zem
don't underestimate the importance of convenience. I'm convinced one of the reasons MySQL had so much more mindshare than postgres back in the day was that it was far easier to get up and running, even if postgres might have been easier to use once everything was set up correctly.
BarryMilo
I don't even remember choosing MySQL when I started (15 years ago). It was just so dominant, we didn't question it.

Nowadays I would still use it because I assume it is the dumbest database system and that's exactly what I need for my 1-5 user app.

5e92cb50239222b
It still has some features over PostgreSQL that pushed me to choose it (actually MariaDB) for a new project about a year ago, namely multi-master replication. Yes, I know, terrible database and horrible feature, but it really helped in that particular domain.

I couldn't find anything decent for postgres, while MariaDB/MySQL have that built-in, with some differences in implementation. Especially for a customer who refuses to pay for his software, because there are some commercial solutions.

bostik
That's funny, I must have been an outlier then.

I've been using Postgres since 1998, and I tried getting MySQL up first. There was more documentation available for the latter, so it should have been simple. Failed. It just didn't work.

Out of frustration I then tried Postgres, because I just wanted a decent database for my project. It was surprisingly easy, I only had to learn about pg_hba.conf to get to a functional state. Everything else was in place out of the box.

I've been a happy user ever since. MySQL may have had the mindshare (thanks to prevalence of LAMP) but everything outside the magic happy path was confusing and fragile.

ahoka
No one cares what was in 1998, that's the whole point of this discussion. Postgres devs kept digging their heads into sand for many years, saying that high availability is somehow not the task of the database. In reality only the datastores need to be HA/durable in an otherwise stateless architecture.
jasfi
Postgres has many HA solutions, and it's getting better all the time. Postgres has good performance, benchmarks need to show the various trade-offs systems make so that informed decisions can be made. Feel free to post a link.

The Graph model can be made available through extensions. See AGE: https://age.incubator.apache.org/ They plan to support OpenCypher.

The JSONB type allows for No-SQL like development if that's what you really want.

zeroc8
Just because you can use JSONB, it doesn't mean that it is as easy to work with as it is the case with MongoDB. So if you really need a JSON store, just use Mongo.
pas
> independence mismatch

you probably mean impedance mismatch, right?

https://en.wikipedia.org/wiki/Object%E2%80%93relational_impe...

vaughan
> Zero impedance mismatch

I see the problem here being that too many manual optimizations need to be done when implementing a schema.

You start with a logical schema (ERD diagram) and then implement it via a physical schema with denormalizations added for efficiency (usually because the relational model has scaling limits with number of joins, or because db migrations are too difficult, or handling unstructured data without introducing a ton of new tables). The db should do the denormalization automatically, allowing the user to interface with their logical schema directly.

Another reason is we can't use SQL in the browser - we have to go through many caching layers and API layers which complicate things.

cwp
+1 to the application/database mismatch. GemStone is really amazing. I never got to do commercial work with it, but damn is it a great environment. I've never seen anything like it.
charlysl
> Zero impedance mismatch between database and application representation

Does this include informacion hiding/encapsulation? (to prevent saved objects' internal representation from being exposed).

Traditional databases don't have an encapsulation mechanism AFAIK, which is one of the reasons for impedance mismatch.

This is important because it is a good practice for client code to make no assumptions about the internal representation, accessing data only via the a public interface.

If it happens to be exposed by the database, the clients can use it in their queries. If the internal representation changed later on, such clients would be broken.

Of course, this can be solved by only allowing data access via, say, well designed restful apis (that don't expose internal details), but this would still provide no guarantees.

How about another reason for impedance mismatch, that of storing objects that belong to a class hierarchy?

vaughan
> good practice for client code to make no assumptions about the internal representation

If your internal representation and API start to differ then it adds complexity fast. Its far better to have as close to a 1-1 mapping for your backend and frontend data models as possible.

charlysl
> Its far better to have as close to a 1-1 mapping for your backend and frontend data models as possible.

I agree with this, but only if qualified as abstract data models

> If your internal representation and API start to differ then it adds complexity fast

I don't think so, and this is why I introduced the abstract, what is often called the logical model, or the business model, or the content, or the interpretation. But then there is also the implementation, the concrete, the internal representation.

A typical example where the internal representation might evolve independently is where you avoid premature optimization. You didn't want to spend too much time making you MVP fast, but if succesful you may revisit your internal rep, without touching the API, because there was nothing wrong with it.

If you had exposed the internal rep in the API, you would have tied your hands, because changing the internal rep to improve performance would change the API too, breaking your clients (and your tests too, which, just like the API clients, should not depend on the internal rep).

Exposing the internals would make it more complex, because you would then have to reason how internal changes could impact clients.

takeda
> Traditional databases don't have an encapsulation mechanism AFAIK, which is one of the reasons for impedance mismatch.

It actually does, those are views and functions.

The real problem with impedance mismatch is that SQL is declarative (you say what you want and database figures out how to get it) when most programming languages are iterative (you say what should be done).

The issue is that you have two very different languages. For one you have powerful IDE with type checking auto completion and refactoring capabilities, the SQL often is sent as a string and don't have these benefits. The various ORM are attempts to use iterative and object oriented language to access relational objects using a declarative language.

I think JetBrains is addressing the problem the right way. They added Data Grip functionality to their IDEs like PyCharm for example. What it does is that if you connect the IDE to a database and let it download the schema you get the same functionality for the data. Basically it will detect SQL statements in the string and offer the same capability for it as you have with the primary language.

At that point the impedance mismatch no longer feels like a mismatch. You basically have two languages, one to obtain data you need and another to process/present it. You can get database to return exact fields you need for your projects and even the object mapping starts feeling unnecessary.

Why data is stored in a relational way? Because that's most optimal way to store the data and the way it is stored allows multiple applications access the same data differently.

For example with NoSQL you need to know how the data will be used so you correctly plan how it will be stored. If application changes you might need to restructure the entire data.

Ultimately the data is the most important thing businesses and it stays, while applications that use it come and go.

charlysl
> For example with NoSQL you need to know how the data will be used so you correctly plan how it will be stored. If application changes you might need to restructure the entire data.

This point is very important, and well explained in Stonebraker's paper "What Goes Around Comes Around". What is most interesting is that he is actually talking about half a century old pre-relational IMS IBM databases, but they had exactly the same issue, hence the paper's title. Codd invented the relational model after watching how developers struggled with the very problem you mentioned.

Stonebraker famously quipped that "NoSQL really stands for not-yet-SQL".

He also addresses the impedance matching issue in the "OO databases" section; there is actually a lot more to it, and he gives it all an insider's historical perspective.

Scarbutt
Your view of what impedance mismatch is doesn't sound accurate. It's not about declarative or imperative or syntax or strings, etc...

It's about data modeling, one models data using relations, the other models data in a hierarchical way (using maps, arrays, objects, etc...). They are two different ways to structure your data, hence the impedance mismatch.

takeda
Perhaps I was using the wrong word. I was referring that traditionally things fell a bit off when working with SQL. Because often it wasn't a code, just a bit of strings that you were sending.

Because of that, developers started to abstract that with code and objects that were then populated with data.

With IDEs understanding the SQL that's no longer necessary. I can construct a specific SQL to get the exact structure my program needs. Even if it is hierarchical I can use various jsonb aggregation functions. That's a game changer to me.

bbsimonbb
Not just jetbrains. pgtyped does this for postgres and typescript, and queryfirst for c# against sql server/postgres/mysql.
takeda
I did not know that, I first encountered that in PyCharm.

I'm glad there's more.

Edit: actually what you mentioned is slightly different. This is what I'm talking about: https://youtu.be/_FlpiNno088?t=2863

wruza
For example with NoSQL you need to know how the data will be used so you correctly plan how it will be stored. If application changes you might need to restructure the entire data.

Honestly, SQL has this problem too, but it presents itself not in the way you store, but in the way you query. There are simple schemas and complex ones, and irrespective of that there are obvious query sets and unplanned ones (i.e. written at runtime as part of the data analysis process). SQL and its autoplanning is required only for complex+unplanned cases, in my opinion. In all other cases I know my data and I’d better walk through the indexes myself rather than writing 4-story queries to satisfy the planner. At the end of the day, nested loops through the indexes is what RDBMS does. There is no declarative magic at the fetch-and-iterate level.

Iow, it would be nice to have “extql” a direct access to indexes and rows, in sort of a way EXPLAIN works, and skip SQL completely.

  function get_items(store_id) {
    for (var item in items.id) {
      var res = item.{name, code}
      var item_id = i.id
      res.price = prices.any({item_id, store_id})?.price
      if (!res.price) continue
      res.props = todict(props.all({item_id}).{name, value})
      yield res // or collect for a bigger picture
    }
  }
This query could be an equivalent of “select from items inner join prices on (store_id, item_id) left join props on (item_id)” but saving space for many props and being much more programmable. Also, it would be nice to have the same engine (sql+extql) at the “client” side, where the inverse problem exists – all your data is nosql, no chance to walk indexes or declare relations.
takeda
> Honestly, SQL has this problem too, but it presents itself not in the way you store, but in the way you query. There are simple schemas and complex ones, and irrespective of that there are obvious query sets and unplanned ones (i.e. written at runtime as part of the data analysis process). SQL and its autoplanning is required only for complex+unplanned cases, in my opinion. In all other cases I know my data and I’d better walk through the indexes myself rather than writing 4-story queries to satisfy the planner. At the end of the day, nested loops through the indexes is what RDBMS does. There is no declarative magic at the fetch-and-iterate level.

The thing is that what worked at specific time can change. For example if you have simple join with two tables, let say A and B. You search by column in target A to get value from column in table B. Now if both tables are large then it makes sense to lookup in A by an index, then use foreign key and index to find the row in table B.

Now if A and B have few elements. Even if there is an index on both of them, it actually is faster just to scan one or both tables.

It might be actually more beneficial to ensure that tables are properly analyzed, have right indices and preferences in the query planner are tuned.

If you need to override query planner, you don't have to make sophisticated queries, you can just use this[1] extension. Though if things aren't working right it is either lack of data, mis-configuration or a bug.

[1] http://pghintplan.osdn.jp/pg_hint_plan.html

Have you seen this presentation about NuBank's usage of datomic https://youtu.be/7lm3K8zVOdY ?
I do remember Datomic and I think it's a great tool but I fell out of love with the Clojure ecosystem and JVM-based languages as a whole and don't think I'll be getting back into it/them.

I do remember wanting to check out Datomic (I believe after seeing a talk on how it was being used at a bank in southern america?[0]), but I found it unreasonably hard to find and download/experiment with the community edition -- compare this to something like Postgres which is much more obvious, more F/OSS compliant (I understand that they need to make money) and Datomic doesn't really look that appealing to me these days.

At this point in my learning of software craftmanship I can't do non-statically type-checked/inferenced languages anymore -- I almost never use JS without Typescript for example. Typed clojure was in relatively early stages when I was last actively using clojure, and I'm sure it's not bad (probably way more mature now), but it's a staple in other languages like Common Lisp (the declare form IIRC). The prevailing mood the clojure community seemed to be against static type checking and I just don't think I can jive with that anymore.

Thinking this way right now Datomic wouldn't be a good fit for me pesonally but I believe that it is probably high quality paradigm.

[EDIT] - I found the talk: https://www.youtube.com/watch?v=7lm3K8zVOdY

[0]: https://www.datomic.com/nubanks-story.html

pgt
Also look at Magic, an experimental typed JVM Lisp with immutable namespaces: https://github.com/mikera/magic
pgt
Yes, Datomic is the killer app for Clojure [^1]. Have a look at Datascript[^2] and Mozilla's Mentat[^3], which is basically an embedded Datomic in Rust.

Hickey's Spec-ulation keynote is probably his most controversial talk, but it finally swayed me toward dynamic typing for growing large systems: https://www.youtube.com/watch?v=oyLBGkS5ICk

The Clojure build ecosystem is tough. Ten years ago, I could not have wrangled Clojure with my skillset - it's a stallion. We early adopters are masochists, but we endure the pain for early advantages, like a stable JavaScript target, immutable filesets and hot-reloading way before anyone else had it.

Is it worth it? Only if it pays off. I think ClojureScript and Datomic are starting to pay off, but it's not obvious for who - certainly not very ever organisation.

React Native? I tore my hair out having to `rm -rf ./node-modules` every 2 hours to deal with breaking dependency issues.

Whenever I try to use something else (like Swift), I crawl back to Clojure for the small, consistent language. I don't think Clojure is the end-game, but a Lisp with truly immutable namespaces and data structures is probably in the future.

[^1]: In 2014 I wrote down "Why Clojure?" - http://petrustheron.com/posts/why-clojure.html [^2]: https://github.com/tonsky/datascript [^3]: https://github.com/mozilla/mentat

Nov 28, 2015 · 132 points, 20 comments · submitted by espeed
psidium
Been using NuBank for about 8 months and I must say, it is the best bank experience I've ever had. They put great effort into every aspect of their business. From the mobile app to the customer experience, it's perfect! (Now that I've promoted you please raise my limit amount - haha jk)
johansch
Is it about more than than a web frontend?

Also: 47 upvotes before my (first) comment is.. weird. Content spam?

hellbanner
Up votes on content without comments aren't unusual if the reader likes the link or wants more people to see it. Could be spam, of course :)
johansch
I would buy it on some very accessible text post - but come on, a video only post with 47 upvotes and no comments? Something is fishy here.

Or maybe people just upvote anything that has the string "clojure" in it? :)

tim333
Also with 40 min videos some people feel they should watch the thing before commenting which delays stuff.
nodesocket
Interesting video and concepts with Datomic, but a few questions.

* Why did Nubank decide to attack the Brazil market instead of US? There have been attempts here in the US; Simple (https://www.simple.com), Standard Treasury (http://standardtreasury.com/), Final (https://getfinal.com/), and Coin (https://onlycoin.com/).

* Isn't reinventing the engineering wheel; in terms of Datomic and a not popular language like Clojure a potential huge distraction to the ultimate goal of building a bank? The problem space is big and hard enough without rolling/maintaining/testing your own database engine and obscure language framework.

hcarvalhoalves
I'll try to address:

* Perhaps surprising, but Brazil is an interesting market. We have a connected young generation and most industries' current players are bad.

* I would argue it's about avoiding reinventing the wheel. Datomic gives you easy scale-out and auditing, and Clojure gives you a flexible, pragmatic functional language with good async/threading primitives running on stablished platforms (JVM and web browsers). I guess going off the beaten path is not necessarily more work if you're taking a shortcut.

andrewchambers
Datomic seemed like an amazing choice for banks to me.
fabioyy
they are a credit card issuer ( mastercard brand ). not a bank.
dang
Ok, we s/bank/credit card issuer/'d the title.
patkai
Datomic sounds really awsome, I'm wondering which other dbms' can it be related to? (assuming there is nothing new under the sun :) )
None
None
hcarvalhoalves
I'm not aware of any, although I'm not sure Datomic qualifies as a traditional DBMS. It's more like a querying API + a master process (transactor) backed by a traditional store/DB (Postgres, Dynamo, etc), so you can achieve the same by putting some existing solutions together, but it's implementing a bit more than a DBMS alone.
fiatjaf
This is not a bank and you know it.
None
None
alecco
Why would you use a slow un-benchmarkable database? There are plenty of good DBs out there for this job.
andrewchambers
I don't know any other db that lets you put arbitrary constraints in transactions and run queries as if you were in the past.
trurl
LogicBlox.
hcarvalhoalves
Disclaimer: I work at Nubank, nice to see it posted here. This video is now one year old and it's nice to see how the technology choices at the start are still paying dividends.

Datomic in particular is a pretty smart product backed by solid concepts (RDFs/triplestores, event sourcing, immutability, querying engine on the client side, datalog), something companies end up re-inventing in-house further down the line, so it's nice to be aware of the architecture even if not intending to use it.

Scarbutt
I didn't watch the video yet, what DB backend does Nubank uses for Datomic?
tim333
"Datomic’s support for multiple storage backends has also been useful to Nubank, as personally identifying information (PII) must be encrypted at rest, something that is easily achieved using Amazon’s PostgreSQL (RDS) with EBS volume encryption for some services, while using DynamoDB to back other, non-PII services."
goldfeld
I'm brazilian and have been working with Clojure for a few years and used Datomic on a project earlier this year. Are you planning on hiring? Sorry for the thread hijack.
hcarvalhoalves
Cool! We are always hiring - you can see current open positions here: https://nubank.workable.com/
nadam said 10 seconds, not minutes.

we are using the starter version combined with dynamodb in production and we found the payment structure very clear; no obfuscation whatsoever, unlike a microsoft or adobe pricing matrix ;) (it's made by the same guy who made the very open source clojure programming language, btw, and he is very much against obfuscation)

anyway, it's a competitive advantage for those guys' who are building a bank on top of it in brazil (https://www.youtube.com/watch?v=7lm3K8zVOdY). we feel we can avoid writing a lot of authorization and audit log code by using datomic, maybe u can save such work too.

It's worthwhile watching Nubank's talk on how they manage these things with Datomic on the backend [0] - providing a "complete" (filtered) database to the client, http caches for queries based on tx-time, and syncing mobiles data via transactions-since. Takes care of said concerns nicely - I'm just implementing this stuff today however, so I'm not sure how well it'll work in practice.

[0] - https://www.youtube.com/watch?v=7lm3K8zVOdY

Jan 12, 2015 · 2 points, 0 comments · submitted by cipher0
Jan 02, 2015 · dmix on Clojure 2014 Year in Review
Not mentioned in the article but most interesting to me this year is how these guys are "building a bank from scratch in Brazil with Datomic and Clojure": https://www.youtube.com/watch?v=7lm3K8zVOdY

The future looks bright for the language, especially with applications like this.

HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.