Hacker News Comments on
Compiling and Optimizing Scripting Languages
Google TechTalks
·
Youtube
·
3
HN points
·
4
HN comments
- This course is unranked · view top recommended courses
Hacker News Stories and Comments
All the comments and stories posted to Hacker News that reference this video.http://www.tara.tcd.ie/handle/2262/77562 - "Design and implementation of an ahead-of-time compiler for PHP" is the PhD thesis in question.A google tech talk related to this work is on youtube https://www.youtube.com/watch?v=kKySEUrP7LA
You can find multiple other papers on google scholar where Paul was an author all related to programming languages, compilers etc.
There's actually a not-too-terrible talk going over some of the challenges of compiling PHP, and how initial promising gains turn out to be very difficult (as OP said) to maintain as you try to encompass the full language. Some great examples of disappointments and challenges!
⬐ markonenPretty sure pbiggar knows about the talk since he's the person giving it!⬐ pbiggarTotally forgot about it :)
I meant mine obviously ;)For the curious, here's the talk (1 hour) about ahead of time compilation of PHP:
⬐ pbiggarWhat's the link to your research?⬐ agumonkeyI was trying to be funny, insinuating your talk was so extensive that my hair fell.
I think its importend to ask everything in these threads. I often find intressting questions answer in these threads (on reddit too). For JIT fans I recomend online-stalking of Mike Pall, here on Hacker News and on reddit.Well I think if you have a task like that a team should look into what other people where doing. Self, LuaJit, parrot, some smalltalk systems, strongtalk where all allready done or on the way in 2008. In a talk about phc (https://www.youtube.com/watch?v=kKySEUrP7LA) Paul Biggar talk about why AOT might be better then JIT. His reasoning was that the suffer from bad startup times and php is a language where the comon uscase are scripts that only run a short time.
Anyway the static compiler certantly wasn't a bad way to get some speed fast.
Getting some type in there is probebly the best thing you can do :) Where can I get some more information about tracelets. I have never seen litrature about it. Did you guys invent that yourselfs?
I have been thinking about something simular. If you have a language that is dynamic but is suited for static compiling (dylan for example). On could write a compiler that does all kinds of optimization and then spits out a bytecode that is suited for the JIT. Then at runtime you let the JIT do the rest. Im not sure if this is such a good idea because the static compiler would take away some opportunities for the JIT to do an even better job. Thoughts anybody?
Thats the problem I always have. For every language I think about how good it would work with pypy. Somebody started to work on a Clojure impmentation in pypy and I want to start hacking on that. Never having done python is holding me back a little.
More Questions:
Since how long and with how many people have you been working on HHVM?
Why is the bytecode stackbased? From what I read stackbased codes arn't really suited to JIT on a register mashine. Dose the JIT compile everything to SSA first? I think thats what Hotspot does (not sure, does anybody know?).
⬐ kmavmHHVM is a follow-on effort to the static compiler. We got started long after HPHPc was in production; I started playing around in early 2010, and three of us (Jason, myself, and Drew Paroski) started earnestly putting in full-time work in summer of 2010.I made up the term "tracelet." Do not use it when trying to sound intelligent. It roughly means "typed basic block," though the compilation units aren't quite basic blocks, and what we're doing isn't quite tracing, and ... so we thought it would be less confusing to just make up a name. Think "little tiny traces."
The bytecode is stack-based for a couple of different reasons, but it will probably remain stack-based because of compactness. Since instructions' operands are implicit, a lot of opcodes are 1, 3, or 5 bytes long. Facebook's codebase is large enough that its sheer compiled bulk can be problematic.
The translator doesn't turn the stack into SSA, instead turning it into a graph where the nodes are instructions and the edges are their inputs/outputs. You can sort of squint at this system and call it SSA where the edges are the "single assignments."
⬐ nickikLet me give you a discription of the compiler and then you can tell me if I understand everything correct.You have a interpreter that for every basic block starts to record a trace. If you walk in to a basic block (or codeblock) the X-time with the same type you compile it for that type and you set a typeguard. Now everytime the interpreter goes trough there again it does a typecheck then either keeps interpreting or jumps into the compiled block. So if you have basic block in a loop you have to typecheck every time you go threw the loop.
⬐ kmavmPretty close. Once we've translated to machine code, the interpreter doesn't do type checks; it blindly enters the translation cache for anything that has a translation, and the guards live in the tracelets themselves. The tracelets directly chain to one another in machine code, so they need embedded guards.⬐ nickikI see. Thx for answering all my questions. Good luck with outperfmorming the static compiler :)