HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Compiling and Optimizing Scripting Languages

Google TechTalks · Youtube · 3 HN points · 4 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Google TechTalks's video "Compiling and Optimizing Scripting Languages".
Youtube Summary
Google Tech Talks
March 18, 2009

ABSTRACT

Presented by Paul Biggar, Department of Computer Science and Statistics, Trinity College, Dublin.

Scripting languages offer unique challenges to compiler writers. Challenges to compilation include undefined and changing language semantics, and run-time code generation. However, optimizing compilers face greater challenges still. Scripting languages offer many run-time features which are difficult to optimize, including run-time typing, run-time aliasing, run-time class and function definitions and run-time code generation. I discuss these problems, and a great number of their solutions, in relation to phc (phpcompiler.org), our optimizing ahead-of-time compiler for PHP.
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Nov 05, 2020 · ook on Why Dark didn't choose Rust
http://www.tara.tcd.ie/handle/2262/77562 - "Design and implementation of an ahead-of-time compiler for PHP" is the PhD thesis in question.

A google tech talk related to this work is on youtube https://www.youtube.com/watch?v=kKySEUrP7LA

You can find multiple other papers on google scholar where Paul was an author all related to programming languages, compilers etc.

There's actually a not-too-terrible talk going over some of the challenges of compiling PHP, and how initial promising gains turn out to be very difficult (as OP said) to maintain as you try to encompass the full language. Some great examples of disappointments and challenges!

https://www.youtube.com/watch?v=kKySEUrP7LA

markonen
Pretty sure pbiggar knows about the talk since he's the person giving it!
pbiggar
Totally forgot about it :)
Sep 03, 2012 · agumonkey on PHP gets Generators
I meant mine obviously ;)

For the curious, here's the talk (1 hour) about ahead of time compilation of PHP:

http://www.youtube.com/watch?v=kKySEUrP7LA

pbiggar
What's the link to your research?
agumonkey
I was trying to be funny, insinuating your talk was so extensive that my hair fell.
I think its importend to ask everything in these threads. I often find intressting questions answer in these threads (on reddit too). For JIT fans I recomend online-stalking of Mike Pall, here on Hacker News and on reddit.

Well I think if you have a task like that a team should look into what other people where doing. Self, LuaJit, parrot, some smalltalk systems, strongtalk where all allready done or on the way in 2008. In a talk about phc (https://www.youtube.com/watch?v=kKySEUrP7LA) Paul Biggar talk about why AOT might be better then JIT. His reasoning was that the suffer from bad startup times and php is a language where the comon uscase are scripts that only run a short time.

Anyway the static compiler certantly wasn't a bad way to get some speed fast.

Getting some type in there is probebly the best thing you can do :) Where can I get some more information about tracelets. I have never seen litrature about it. Did you guys invent that yourselfs?

I have been thinking about something simular. If you have a language that is dynamic but is suited for static compiling (dylan for example). On could write a compiler that does all kinds of optimization and then spits out a bytecode that is suited for the JIT. Then at runtime you let the JIT do the rest. Im not sure if this is such a good idea because the static compiler would take away some opportunities for the JIT to do an even better job. Thoughts anybody?

Thats the problem I always have. For every language I think about how good it would work with pypy. Somebody started to work on a Clojure impmentation in pypy and I want to start hacking on that. Never having done python is holding me back a little.

More Questions:

Since how long and with how many people have you been working on HHVM?

Why is the bytecode stackbased? From what I read stackbased codes arn't really suited to JIT on a register mashine. Dose the JIT compile everything to SSA first? I think thats what Hotspot does (not sure, does anybody know?).

kmavm
HHVM is a follow-on effort to the static compiler. We got started long after HPHPc was in production; I started playing around in early 2010, and three of us (Jason, myself, and Drew Paroski) started earnestly putting in full-time work in summer of 2010.

I made up the term "tracelet." Do not use it when trying to sound intelligent. It roughly means "typed basic block," though the compilation units aren't quite basic blocks, and what we're doing isn't quite tracing, and ... so we thought it would be less confusing to just make up a name. Think "little tiny traces."

The bytecode is stack-based for a couple of different reasons, but it will probably remain stack-based because of compactness. Since instructions' operands are implicit, a lot of opcodes are 1, 3, or 5 bytes long. Facebook's codebase is large enough that its sheer compiled bulk can be problematic.

The translator doesn't turn the stack into SSA, instead turning it into a graph where the nodes are instructions and the edges are their inputs/outputs. You can sort of squint at this system and call it SSA where the edges are the "single assignments."

nickik
Let me give you a discription of the compiler and then you can tell me if I understand everything correct.

You have a interpreter that for every basic block starts to record a trace. If you walk in to a basic block (or codeblock) the X-time with the same type you compile it for that type and you set a typeguard. Now everytime the interpreter goes trough there again it does a typecheck then either keeps interpreting or jumps into the compiled block. So if you have basic block in a loop you have to typecheck every time you go threw the loop.

kmavm
Pretty close. Once we've translated to machine code, the interpreter doesn't do type checks; it blindly enters the translation cache for anything that has a translation, and the guards live in the tracelets themselves. The tracelets directly chain to one another in machine code, so they need embedded guards.
nickik
I see. Thx for answering all my questions. Good luck with outperfmorming the static compiler :)
May 11, 2009 · 3 points, 0 comments · submitted by nreece
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.