HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
JOTB19 - A race of two compilers: GraalVM JIT versus HotSpot JIT C2. by Ionut Balosin

J On The Beach · Youtube · 114 HN points · 1 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention J On The Beach's video "JOTB19 - A race of two compilers: GraalVM JIT versus HotSpot JIT C2. by Ionut Balosin".
Youtube Summary
Do you want to check the efficiency of the new, state of the art, GraalVM JIT Compiler in comparison to the old but mostly used JIT C2? Let’s have a side by side comparison from a performance standpoint on the same source code.

The talk reveals how traditional Just In Time Compiler (e.g. JIT C2) from HotSpot/OpenJDK internally manages runtime optimizations for hot methods in comparison to the new, state of the art, GraalVM JIT Compiler on the same source code, emphasizing all of the internals and strategies used by each Compiler to achieve better performance in most common situations (or code patterns). For each optimization, there is Java source code and corresponding generated assembly code in order to prove what really happens under the hood.

Each test is covered by a dedicated benchmark (JMH), timings and conclusions. Main topics of the agenda: - Scalar replacement - Null Checks - Virtual calls - Lock coarsening - Lock elision - Virtual calls - Scalar replacement - Lambdas - Vectorization (few cases)

The tools used during my research study are JITWatch, Java Measurement Harness, and perf. All test scenarios will be launched against the latest official Java release (e.g. version 11).
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Jun 18, 2019 · apta on Go Creeping In
The golang gc is not tunable, at least compared to the JVM GC for example. It is tuned for latency, not throughput. So, if you're writing high-throughput code, you can't really do much. You can try to write code that reduces allocations, but so can you in Java or C# (even moreso after Java gets value types).

Not to mention that the golang gc cannot handle GBs (let alone TBs of heap space) the way the JVM routinely does.

The golang runtime cannot devirtualize interface implementations the way the JVM does[1] or .NET does.

I'm sure there are more examples to think of.

[1] https://youtu.be/lunJmMBkqLo?t=1072

Thaxll
Go GC can easily handle GBs of heap, especially if you're careful with pointers.
apta
I benchmarked etcd with 10G-50GB heap space and it didn't do a good job at all, lots of CPU time spent in GC.
todd8
The performance of the etcd system is complex[1], there’s raft consensus, replication, security, snapshots, networking, and basically a lot going on.

Perhaps, your benchmarks really do reflect problems that go has with gc, but perhaps some other part amonst the numerous moving parts are causing the poor benchmark results. I’m not sure that we can conclude from your results that go has gc problems when the heap is large.

[1] https://github.com/etcd-io/etcd/blob/master/Documentation/op...

apta
From what I recall, having a large-ish number of watchers resulted in memory usage to go up significantly. The gc started thrashing as CPU usage went up. Put performance dropped with gc cycles.
Jun 14, 2019 · 114 points, 60 comments · submitted by pjmlp
Bootvis
I got quite good results running some Sudoku solving code in GraalVM (FastR). It was faster than an R/Rcpp hybrid:

http://bootvis.nl/fastr-sudokus/

Later I ran into some errors which are supposed to be fixed in the development branch but I haven't tested.

andersson42
Just in time compilation, I think it should be called “continuous profile guided compilation” instead, describes better the awesomeness that happens...
chrisseaton
Some people differentiate between just-in-time compilation, which is what for example .NET does (or did, last time I checked), where it just literally compiles it as it would ahead-of-time, but at the last second before executing it for the first time, and dynamic compilation, which is for example what Graal does - compiling based on runtime conditions, possibly multiple times with different results as the program executes.
lewurm
> [...] which is what for example .NET does.

.NET is the platform. There are different implementations for it doing different things.

JIT compilation is still different to AOT even without profile guided optimizations. Simple example: In AOT code you can't embed pointers easily and is often solved with indirection (e.g. something like GOT in ELF).

chrisseaton
> There are different implementations for it doing different things.

And are they now speculative? They weren't for the first 15 years or so.

lewurm
CoreCLR started to implemented Tiered Compilation https://github.com/dotnet/coreclr/blob/master/Documentation/...

It's experimental currently, no profile guided optimizations _yet_

pjmlp
.NET Framework supports PGO since .NET 4.5.
pron
JITs do more than just profile-guided optimizations. Their secret weapon is speculative optimizations that mean they don't need to work hard (and often fail) to prove the soundness of certain optimizations. They're allowed to guess and be wrong.
amelius
If they do speculative optimizations, then doesn't that open the door to Spectre-like vulnerabilities, but now at the compiler level?
chrisseaton
This isn't speculation as in speculative execution, it's speculation as in speculating that a condition is true that cannot be proved to be true, so it's not the same thing.

An example of this kind of speculation is speculating that there will only ever be one thread in a system, and removing locks. If that speculation ever proves to be wrong - a second thread is created - the locks are put back into the system.

That doesn't related to spectre, as when the speculation is reversed the whole programs is first brought to a safe halt - it isn't fine-grained enough to be useful for Spectre.

Unrelated, it is true that compilers need to be aware of Spectre-like vulnerabilities, and Graal does include experimental support for that.

amelius
Ok, but in a multi-user system (e.g. a webserver), if user 1 triggers a (de-)optimization, then user 2 can tell that a previous user was in that code path. Now I don't know how to extract useful information from that fact, but it shows that at least some information spills over the user boundaries.
chrisseaton
Oh I see what you mean - Spectre-like rather than specifically Spectre. Yes I suppose specialisation (more generally than speculation) could leak information, in the same way as cache status can leak information.
pvg
Wait, ignore soundness? How does that work/provide advantage?
zlynx
A function could be compiled assuming the passed argument is always 2. All of the code for other values is just left out.

As long as the compiled code has a check for values that are not 2 this code works great. It isn't correct though.

At least, that's my interpretation.

pvg
Maybe I'm getting tripped up in the terminology here but to me this case is still a JIT jittin' - you look at runtime data and decide it's worth it to crank out a special case optimization for the input of 2. You produce that that optimization, which is sound along with a check to make sure it is applied only in the special case. You get to defer other optimization. The advantage here still seems to come from the runtimeness of things rather than from being clever about soundness and there's really no guessing about soundness. So perhaps that's not it.
chrisseaton
> to me this case is still a JIT jittin'

I think the point is that some JITs never do this kind of optimisation - they just produce the same code an AOT compiler would, but at runtime. Such as the .NET JIT.

pjmlp
I guess you need to update your knowledge regarding the several .NET JITs in use.
chrisseaton
As I cautioned in another comment

> (or did, last time I checked)

Do implementations of .NET JITs now do speculative optimisations or dynamic compilation? They didn't see the need for it for about 15 years.

pjmlp
15 years ago there weren't RyuJIT which replaced the JIT you learned from, MDIL (Windows 8/8.1), .NET Native (UWP), IL2CPP (Unity), and the research ones from Singularity and Midori.

In what concerns the need for it, they have been trying to make C# more relevant for the kinds of C++ workloads and getting among the first places at TechEmpower.

So .NET has been getting Modula-3 like low level handling of value types within a GC environment, RyuJIT is now tiered, supports SIMD and some automatic vectorization.

.NET Framework 4.6 got the first version of what is the .NET way of doing AppCDS.

There are a couple of blog posts regarding RyuJIT improvements with each release after its introduction.

chrisseaton
So which of these implementations does speculation? I remember when RyuJIT came out it still wasn't speculative - has that now changed?

If you read the blog posts, they always talk about speculation being something they may try in the future. I've not seen anything where they say they went ahead and implemented it.

pjmlp
Here is some information

Background JIT overview, which is a kind of PGO for the .NET Frameworok

https://msdn.microsoft.com/en-us/magazine/mt683795.aspx

And I think this goes into line with what you are discussing,

https://github.com/dotnet/coreclr/pull/21270

I also agree that many things remain to be done in line with what Graal is capable of.

pron
Yeah, seems like they started exploring speculation last December. It's not only Graal that's based on speculative optimizations, though, it's C2 as well, and from the very beginning. .NET's compiler is at least a decade behind C2 at this point. But because MS has usually opted for making the frontend language have more control, then they don't need a state-of-the-art compiler as much as the JVM.
pron
Yeah, I think this is the relevant bit: https://github.com/dotnet/coreclr/blob/master/Documentation/...

Seems like they started trying speculative optimizations about six months ago. Speculative optimizations are not only the foundation of Graal but also of C2, BTW.

kjeetgill
Maybe I'm only familiar with "the main one" and mono... Are there other .NET VMs?

If I recall correctly, it will do constant folding, but won't speculate that a certain parameter is always essentially constant at runtime, but wasn't at compile time.

An easy example is a config loaded from a file as the server boots but never changes for the lifetime of the process. That won't constant fold without speculation.

pjmlp
There is the old style JIT, RyuJIT introduced with .NET Framework 4.6, MDIL, .NET Native, Mono, .NET CF, IL2CPP, and the research ones from Singularity and Midori.

So while it is hard to state what each AOT/JIT compiler is capable of, naturally they aren't 100% all the same.

kjeetgill
Ah! I should read more about the world. I was recalling from a conversation I had with a .NET engineer at JVMLS last year.
pvg
I don't think that's the point the comment I'm replying to is making, or at least, it's not the point I'm asking about.

Edit: Your example in the other comment about the locks is the sort of thing I'm asking about. There, an optimization is made which is sound under some specific conditions and then unmade when those conditions change.

marvy
I think that is indeed the point pron was making, or at least similar. You can't actually ignore soundness, but JVMs sometimes go farther than I'd expect. (Example: don't check for null, just handle SIGBUS if null is "very rare")
pvg
Yes, on re-reading the thread again it might be entirely (or almost) about language. As in, it's really something along the lines of 'The power of the JIT approach comes from runtime information and dynamism. But you can also be a 'just' a JIT without making use of any of that'. And I'm getting stuck on 'secret weapon [...] soundness' and imagining some unfathomable-to-mortals ninja something.
ddragon
For example a JIT compiler can, based on the runtime information + inference, discard multiple branches of a function that are effectively pointless and possibly even just inline the results if given those inputs the function is deterministic.

Of course, the function it compiled will actually not work with any different but still valid arguments, but that's not really trouble since the JIT compiler will simply evaluate that the already compiled version won't work for those as the function is called and compile a new version of the function for the new types just before. A pure ahead-of-time compiler wouldn't be able to optimize so aggressively since it would lead to an exponential explosion of possible inputs combinations most of which will very likely never happen.

pvg
I put it in a bit more detail in a sibling comment but to me, there's no 'guessing about soundness and being wrong' in that particular scenario.
ddragon
I guess it depends on the perspective and interpretation of soundness. If a JIT-compiler AoT compiles your entire program but infers that your function that implements logic that works for every number (as you defined it using the Number interface within the rules of it's type system) will only use 32 bits integers, then it will compile code that effectively does not hold up to the property that was established. The fact that it will stop execution the moment it reaches an invalid path and correct it doesn't change that.
pvg
It could be and that would be uninteresting. But it's not hard to come up with a (contrived, limit-casey) optimization approach that does actually make guesses about soundness.

Let's say you wanted to optimize a short instruction sequence with a small domain of inputs. You could try to generate all (or at least, zillions) of similarly-sized possible instruction sequences and check them for soundness and performance. Now you're really making soundness guesses. Do real JITs actually make that sort of soundness guess (not that kind of attempt at optimization, obviously)?

ddragon
A JIT compiler could detect that an instruction sequence (or function) is pure (for some range of valid inputs) and auto-memoize them for performance gains. But if you want the JIT to evolve compiled representation by profiling some fitness measurement (performance) and condition (soundness), that will most likely not happen any time soon. The JIT compiler has to balance compile time execution with runtime execution, if it wastes 5s to generate a program that runs in 2s when it could waste 1s to generate a program that would run in 4s then it would not be a good compiler at all. And above all else, JIT compilers, even the notoriously aggressive ones, still need to have some degree of predictability. If the user of the language can't predict the performance of the language, then they can't reliably improve their code performance.

I'm mostly talking about production-ready stuff, such work is certainly some fun playground. The Julia JIT (one of the notoriously aggressive JITs, for good and bad), allows users to, at runtime, add new context-aware behaviors to the compiler [1], and people used it for example to experiment with auto-parallelization of code and overall manipulating the code generated by the compiler. That was basically what got me into the language. So you could probably make a library that would inject some weird risky optimization that abuses the type system.

[1] https://docs.google.com/presentation/d/1IiBLVU5Pj-48vzEMnuYE...

pron
All the time, and BTW, I didn't say JITs sacrifice soundness but that they don't require proof of soundness. That's different as I'll show.

Let me give you two common examples: virtual calls and branches. A JIT will speculatively devirtualize and inline a virtual call at a particular callsite if it has only encountered one or a small number of concrete instances, even if it can't prove that those are the only instances that can be encountered at that callsite. This is still sound because the JIT will emit a trap that will trigger if an unknown target is ever encountered, in which case it will deoptimize the compilation, go back to the interpreter and then compile again under new assumptions. Another example is branch elimination. If a JIT only ever encounters the program taking one side of a branch, it will only compile that branch (and introduce a trap), even if it can't prove that only that side will ever be taken.

pvg
Thanks! I did (eventually) figure it out, I initially misread it as something like:

1. Jettison soundness

2. ???

3. Performance profit.

Which seems like witchcraft, then again JITs are full of witchcraft. But it's also not what you wrote. I've now come to understand the two chief weapons of the JIT remain surprise, fear, ruthless efficiency and an almost fanatical devotion to the Pope.

the8472
Another advantage they have is that they can focus the spent optimization cycles on hot code.

AOT compilers can't afford running optimization passes in a loop (inline, optimize, inline, optimize, ...) until they reach a fixed point because that would blow up compile times if that were applied to the whole program.

astrange
This almost never matters. You just start at the leaves and go up and then you're done. Most people aren't interested in complicated superoptimizations, because a predictable compiler is more important.
pron
That's not true. A lot of very simple and useful optimizations are very hard to prove correct (e.g. devirtualization) and so can't be done with AOT compilers. It doesn't matter to people using languages that require the programmer to carefully control the compiler -- like C/C++ or Rust -- but it matters a great deal to languages that offer a smaller number of more general abstractions. It is virtually impossible to compile, say, JavaScript efficiently with an AOT compiler, but when compiled with a JIT it can have excellent performance.
repolfx
From what I know, GraalVM EE (Enterprise Edition) does do loop vectorisation.

This will lead to an interesting problem if they want to replace C2 with Graal. Are they willing to regress performance for some open-source-only users, even if it's a performance win for others?

pjmlp
OpenJDK for Java 12 also does it, as Intel has contributed AVX optimizations to it.
kasperni
Yes, this will be very interesting. Also, since GraalVM is pretty modular. Will someone provide a free/open source version with loop vectorisation and other goodies.
astrange
That shouldn't be interesting in Java. Most programs can't be autovectorized, especially in a language without any SIMD constructs. If you try to guess what the compiler thinks vectorization looks like, you'll probably get it wrong.
apta
Java will be exposing parallelization constructs by means of project Panama: http://cr.openjdk.java.net/~vlivanov/panama/vectors/vectors....
grashalm
Contrary to popular believe, automatic loop vectorization is not as important to most Java workloads as one might think. It gets a lot of visibility as it causes significant peak performance differences in micro-benchmarks.

In the end, you should not trust standard benchmarks and definitely not micro-benchmarks. Do perform tests with your own workload.

imtringued
But this is only true because the Hotspot JVM just isn't meant to be used for high performance code. In theory with a good enough JIT, it should be possible to achieve high performance even in Java. One day Java might have value types and then these optimizations will be able to shine.
pron
Here is the same speaker comparing C2 and clang: https://youtu.be/0yrBuPiGk8I
eby
What's the current status of Graal's Python implementation in terms of reaching actual usability? The README on its GitHub repo [1] doesn't inspire much confidence but whenever I check that wording hasn't changed.

I've been following Graal for quite some time, both as a former PLDI guy but also for my day job. I work in bioinformatics software (mostly cancer genomics research) and our group has a ton of (mostly legacy) code in Java and R, but most of the newly-minted grads coming in lean towards Python.

As one of the guys pulling all this stuff together, the Graal "polyglot" multilingual VM concept is of tremendous interest as you can imagine. It would be great to be able to package the legacy stuff interoperably with the new stuff no matter the language, even setting aside the bonus of better performance. But it has basically no practical use to us without Python (+ packages!) due to the direction and language inclinations of the group.

Is there anything new happening on that front? Or anything we could do to help it along? Is there more a detailed status page anywhere? Any sense of when this might land in a truly usable form, or what's the hold up?

I'm a bit surprised that the progress with R (with packages) is so far along but the progress with Python (with packages) seems stagnant (at least according to that README). No offense meant to the team, but that's the appearance. Is it the GIL?

[1] https://github.com/graalvm/graalpython, which calls it "early-stage experimental" and "very likely that any Python program that requires any packages at all will hit something unsupported".

sjcoles
I had decent luck with Nuitka[1] as long as the project is 100% python. The executables are large but have been mostly portable IME (some glib problems can arise though).

Largest project I compiled was only ~1000 lines but used external deps of pymysql, jinja2, ldap3 along with the stdlib's shutil, tempfile, pathlib, and the base os lib without issues. It takes ~30 minutes to compile on a decently powerful machine though (8650u and 32gb of ram). Most of this time was spent on pymysql and jinja2's compilation.

[1] https://nuitka.net/

eby
Thanks, but I don't see what that has to do with Graal or multi-language interoperability which is the key thing here. We have substantial code in Java, R and Python that could all benefit from being able to call one another from within the same process.

An alternative Python compiler by itself frankly buys us very little. Perhaps Jython, if it weren't targeting 2.7.x.

wiradikusuma
A bit OOT, has anyone tried Scala on Graal? How's it?
Recurecur
If Graal and Scala interest you, then you should be aware of the Scala Native project:

http://www.scala-native.org/en/v0.3.9-docs/ https://github.com/scala-native/scala-native

It provides manual memory management as needed, making it suitable for even hard real time applications.

luu
It seems pretty good. Twitter has some talks on this where they claim signifiant performance improvements: https://www.youtube.com/watch?v=PtgKmzgIh4c.

I've heard that the Twitter JVM team has a road show where they've talked to some other large Scala users about the performance improvements. Initially, people are highly skeptical of the claims, but after trying Graal on their internal workloads, they generally see similar results.

Here's a paper which has some explanations for why you might expect Graal to improve Scala performance: http://aleksandar-prokopec.com/resources/docs/graal-collecti...

mey
Is it clear what the licensing is/will be for Graal?

Edit: GraalVM CE describes it on the download page https://www.graalvm.org/downloads/

saxonww
Since it's not immediately clear from that page - the GraalVM CE distribution is built from source hosted on GitHub (https://github.com/oracle/graal), and at the moment the LICENSE file there says:

  This is a release of GraalVM Community Edition 1.0.
  GraalVM Community Edition 1.0 consists of multiple modules. 
  The software as a whole, as well as the JVMCI and VisualVM 
  modules, are released under version 2 of the GNU General 
  Public License with the “Classpath” Exception.
chii
they don't really clearly list the differece between the CE and the EE version

what is "Improved performance and smaller footprint"? If i use the CE, does it produce worse code somehow?

exabrial
I'm curious if GraalVM can do the same rockstar party tricks on Java 11? He uses Java 12, but doesn't go into huge detail why he chose that rather than the LTS.
chrisseaton
> doesn't go into huge detail why he chose that

12 is the latest release - that's probably why.

pron
12 is the current JDK version. OpenJDK has no notion of LTS (e.g. you will see no mention of 11 being an LTS on the project page: https://openjdk.java.net/projects/jdk/11/ nor any special designation compared to JDK 12's page), and all versions are equal.

Java's LTS means something that could perhaps be quite different from LTS in other projects. LTS is a service offered by companies to arbitrary JDK versions of their choice (e.g. Azul offers extended support for versions other than those Oracle does); there is nothing special about the development, testing, effort or focus put into those versions. In addition, people can choose to maintain OpenJDK update projects, as Red Hat does for 11. Anyway, JDK 12 is simply the current JDK version, and there is no reason to use an old version for a technical discussion -- there is nothing more stable about it, or any other technical difference -- even if companies offer extended support for it. (I work on OpenJDK at Oracle)

twic
> OpenJDK has no notion of LTS

But two of the main maintainers, Oracle and Red Hat, absolutely do.

> LTS is a service offered by companies to arbitrary JDK versions of their choice

And that arbitrary version is 11.

It's technically accurate but substantially disingenuous to suggest that 11 is not Java's current LTS.

pron
> But two of the main maintainers, Oracle and Red Hat, absolutely do.

BTW, Oracle contributes ~90% and Red Hat ~5%.

> It's technically accurate but substantially disingenuous to suggest that 11 is not Java's current LTS.

Either way that has little significance here. It is not the most popular version of Java in use today (that would be 8u2XX) nor is it any more production-ready or stable than 12. You can say that we're interested in results for the current version of Java or in the most popular one. I don't understand why it would be particularly interesting to discuss JDK 11, which is neither.

kjeetgill
I'm really not sure where your confusion is coming from.

The post itself is interesting on two fronts: as a new emerging technology but also as a leverageable tool. As an OpenJDK developer I understand your interests are more about the former.

For many of us are using Oracle or RedHat LTS builds, we are running either 8 or 11 for "reasons". It's pretty natural to know if these new changes to the platform apply to a given version without asking.

yogthos
As a side note, GraalVM is quite usable for real world stuff nowadays. Here's an example of running a Clojure web service with Graal that provides JSON endpoints, talks to the database, and does session management: https://github.com/yogthos/graal-web-app-example

The same app can be run on the JVM or compiled statically using Graal. The JVM version takes around a 100 megs of RAM, and has a significant startup time. The Graal version weighs in at 7 megs, and starts up instantly.

HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.