Hacker News Comments on
Compilers: Principles, Techniques, and Tools
Hacker News Stories and CommentsAll the comments and stories posted to Hacker News that reference this book.
I'm starting with Language Implementation Patterns , which is more practice focused than the Dragon's book . Most of the examples are in Java, which you can then convert into your current implementation language.
Writing games are a really fun way to learn to program. The best way to learn how is to find an open source game that you like, figure out how to build it, then start modifying the game to your personal taste.
Learning the basics of writing compilers will be surprisingly helpful for all kinds of programming tasks. The Dragon Book is the best:
Too bad the prices are so high for it. The original version is much cheaper:
Learning calculus is a great way to train your mind to think better.
⬐ JaDoggThank you for the reply. This is valuable and actionable advice.
There is a more recent edition of "Compilers: Principles, Techniques, and Tools" worth considering: https://www.amazon.com/Compilers-Principles-Techniques-Tools....
You'll want the Dragon Book as a reference, not necessary to get started with, but to build the semantic tree on which you will need to pin concepts on. https://www.amazon.com/Compilers-Principles-Techniques-Tools...
If you aren't yet committed to any language, you can start building a parser with PyParsing. It's really easy. http://pyparsing.wikispaces.com/
If you want to take a quick (albeit expensive) class on it, Dave Beazley offers one: http://www.dabeaz.com/chicago/compiler.html
⬐ danblickHmm. This might be kind of blasphemous, but I think I'd recommend against the dragon book? As I remember, the emphasis in that book is on syntax-directed translation. I might argue that the less you're thinking about syntax and lexing/parsing crap the better. (For that reason: ignore other answers that tell you to learn about a particular parser generator (e.g. ANTLR). That's just fluff.)
Really, if you're coming to programming language design with the thought "I'm going to make an imperative, object oriented language" (with parallelism as an afterthought), you're doing it wrong. The world has enough of those already and you're going to invent something worse than what's already there.
Probably, instead of inventing a new language (say Matlab for matrix operations or Prolog for logical reasoning), you'd be better off implementing a library that handles the same concepts and embeds into another language (which is really what happened with Tensorflow or MapReduce to think of two examples).
(Grune's book "Parsing Techniques" is a great reference on parsing crap, but the secret is that if you design your grammar to be LL(1) you can parse your language using recursive descent: you only need a fancy parser if you designed a more complicated grammar (why'd you do that?))
Recommended book: The Reasoned Schemer. It's a cute (maybe too cute) book that shows how to implement a logic programming language (~datalog) using scheme as a base language. The Wizard book (structure and interpretation of computer languages) also has really cool examples that I think 60% of programmers I've worked with in industry don't fully appreciate.⬐ iainmerrickThis might be kind of blasphemous, but I think I'd recommend against the dragon book?
I agree! It's hard to criticize a book rightly regarded as a classic, but I think it solves the wrong problems, or at least emphasizes the wrong areas.
if you design your grammar to be LL(1) you can parse your language using recursive descent
Yep, that's the way to do it.⬐ wencI read the question as the OP wanting to go through a didactic exercise in building a programming language from the ground up rather than building a DSL, in which case, lexing and parsing is a fundamental concept to cover, among others. I agree with keeping it LL(1), but it may also be worth it for the OP to understand what LL(1) means. I agree parsing is not at all the most important thing to learn in language implementation, but it seems to me almost a necessary thing to cover in order complete one's education in the subject. To omit it would be akin to omitting a discussion of limits in a study of calculus.⬐ ww520Only the first couple chapters deal with lexing and parsing. The Dragon book covers a lot more than just the syntax analysis: semantic analysis, type checking, run time environment design, intermediate language design, code generation, code optimization, and more. It doesn't have the newer stuff like JIT or Hindley–Milner type system, but those can be picked up separately with reading the papers.
OP asks for resource to build a language. The Dragon book is a very good book to cover the whole process.
I strongly suggest you build something esoteric and fun first. This should be a "bare minimum" VM or interpreter. Here's one of mine that I wrote like a 8 years ago: https://github.com/dvx/zeded/
Don't start off with YACC/Bison as they hide a lot of stuff under the hood. It's cool learning things from scratch. The most commonly-suggested book on compilers is known as the Dragon Book and if you want to take this endeavour seriously, you should really get a copy.
⬐ jimktrains2For the dragon book, the first edition is significantly cheaper. Is there anything in the new version making it worth over 10x as much?⬐ CalChrisThe later chapters on ILP, software pipelining, optimizing for parallelism and locality are completely new. Basically, chapters 8-11 have been either re-written or written from scratch. On the other hand, the 2nd dropped the Want to write a compiler and A look at some compilers chapters.
I used the dragon book (2nd edition) - https://www.amazon.com/Compilers-Principles-Techniques-Tools...
That proof isn't a problem in the book; it's mentioned, with a hint for those inclined to derive it themselves, in the running text.
In 2006 a second edition of the dragon book was released: https://www.amazon.com/Compilers-Principles-Techniques-Tools...
There are many different styles and paths to learning 'Computer Science'.
But if what you are after is insight into how a computer works I found that I had my 'ah-ha' moment while learning C, Assembly (intel), and writing a compiler. I did have to have a slight basis in computer architecture, but that compiler project I worked on made everything click.
(side note on writing a compiler. Read, Decode, Execute. There are no short cuts around those series of steps).
If you are looking for a book I would recommend the 'Dragon Book'
I found a paper copy of the international version for cheap (like $10 US if I remember) that was amazing.
For anyone interested in compiler writing and looking for a good resource to start, probably one of the best is the "Dragon Book":
I highly recommend it, but it's heavy stuff. There are probably simpler guides out there that just cover the basics.
⬐ OcergeIndeed it is the standard, but I'm pretty sure I have a permanent twitch in the corner of my eye from my university compilers course and that book. It's probably not something I could ever do for fun, but it's extremely important and useful stuff to know. I'm just too stupid to really grok a lot of the concepts required to write a compiler.⬐ cbd1984Jack Crenshaw, of "Let's Build A Compiler" fame, has an interesting critique of works like these:
The main point for our discussion is last:
> Desire for Generality
> We have been concentrating on the use of a recursive-descent parser to parse a deterministic grammar, i.e., a grammar that is not ambiguous and, therefore, can be parsed with one level of lookahead.
> In practice, these issues turn out to be considerably less important. Modern languages tend to be designed to be easy to parse, anyway. That was a key motivation in the design of Pascal. Sure, there are pathological grammars that you would be hard pressed to write unambiguous BNF for, but in the real world the best answer is probably to avoid those grammars!
Here's a bit more of what he said, heavily snipped:
> Limited RAM Forcing Multiple Passes
> All the early compiler writers had to deal with this issue: Break the compiler up into enough parts so that it will fit in memory. When you have multiple passes, you need to add data structures to support the information that each pass leaves behind for the next. That adds complexity, and ends up driving the design.
> Batch Processing
> In a mainframe compiler as well as many micro compilers, considerable effort is expended on error recovery ... it can consume as much as 30-40% of the compiler and completely drive the design. The idea is to avoid halting on the first error, but rather to keep going at all costs, so that you can tell the programmer about as many errors in the whole program as possible.
> Large Programs
[Basically comes down to "This is 1980s Micro Land. We don't have mmap or virtual memory in general. The simple way is to keep everything in RAM and encourage small subroutines."]
> Emphasis on Efficiency
[This is an interesting point. He says that we have fast enough CPUs that compilers can emit sub-optimal code and it doesn't matter.]
> Limited Instruction Sets
[Eh. I don't agree that limited instruction sets make compilers more complicated. Even in his design, code generation was a rather trivial part of the entire program.]
Main link:⬐ kevin_thibedeau⬐ gamapunaMay of these issues are still relevant to embedded systems with limited resources.⬐ sklogicIf you really want to compile something on an embedded system, you'd be much better off with something like Forth anyway.⬐ nickpsecurityOr macro-assembler. I think HLA's, Hyde's and the general concept, are underappreciated for this use case.I am working my way through Alex Aiken's Coursera compiler course, almost 3/4th done and really liking it till now. There was another resource posted on HN earlier which takes you through building a compiler for lisp using C. http://www.buildyourownlisp.com/contents⬐ sklogicPlease stop recommending the Dragon Book already. It is not just heavy, it is mostly outdated and irrelevant.⬐ nickpsecurity⬐ astrangeThere's quite a few decent alternatives. I often suggest Wirth's Compiler Construction and Oberon sources because they're straightforward lessons plus give experience with Wirth style of simple, safe, efficient languages. Then, they can improve the Oberon System or compilers for personal projects and improvement.
That said, I typically recommend compilers get written in an ML or LISP given it's so much easier to do in those languages. Ocaml and Racket are my main recommendations for modern work. One can also use techniques for deriving imperative code from functional programs if one wants to re-implement that compiler in specific imperative language on a machine. The constructs are even straight-forward enough for macro assembly for those that want to understand it down to bare metal.⬐ urs2102⬐ troydjI definitely agree with ML and OCaml, but why do you recommend Lisp for compiler work? I love working withs Lisps, but how do you deal with dynamic typing bugs in compiler work? I prefer OCaml/Haskell/ML for its really strong static type checking for compiler work. Just curious though...⬐ sklogic⬐ sklogicML got ADTs and static type guarantees, but Lisp got macros, which allows to implement things like http://andykeep.com/pubs/np-preprint.pdf - which is significantly less boilerplate than in ML. An ideal language for compilers would have combined both properties, but to make it happen, an expression problem must be solved first.
See a relevant discussion here from not long ago:⬐ nickpsecurity⬐ nickpsecurityA commenter elsewhere said Racket had ADT's with langplai but also with langracket via subtypes of struct type with match macro. Said it was same style of programming.
So that's static typing (Typed Racket) and macros for sure. Possibly ADT's depending on whether you object to claim above. Should cover your list of requirements.⬐ sklogicThe problem here is that a single variant tag does not uniquely define a type, as all such type systems expect.⬐ nickpsecurityYou think it could be modified to handle that?LISP js easy to parse, transform, build clean-slate, verify, and do fast. My first 4GL tool was a BASIC w/ macros compatible with C and C++. When converted to LISP, it was vastly easier to do transforms and macros plus productivity boost with incremental, per-function compilation. Turns out, Julia did something similar by converting the source into AST's that are LISP expressions with compiler being femtolisp.
Far as safet, you can build checks into your code for protection or use a typed subset of LISP (eg Typed Racket). Shen goes further by embedding whole sequent calculus for custom, per app/module, type systems.
So, not saying LISP is ideal for compilers but does have some ideal and good attributes. Ocaml is my default given it combines conciseness, correctness, decent performance, ease of learning, and some helpful libs.Yes, functional side of things is interesting and rarely covered in the standard compilers courses.
There is an old but good book, often overlooked: "Functional Programming" by Anthony J. Field and Peter G. Harrison (1988). Despite the title, it's more about various compilation techniques.
Also an old but still relevant (it is missing the STG, but otherwise full of interesting stuff): http://research.microsoft.com/en-us/um/people/simonpj/papers...
It also worth following this blog: https://wingolog.org/⬐ nickpsecurityI got the 2nd link here before, maybe from you. Might be old but very thorough treatment of about every aspect of functional compilation. I have it saved in djvu and pdf in case I neet someone starting in that area. :)
The other two are new to me. Thanks for those.I wouldn't go so far to say that the Dragon Book is outdated and irrelevant. (I'm assuming you're referring to the 2nd edition from 2006.) Unless you're focusing on back-end optimization and code generation techniques (something a new compiler writer typically does NOT do), the bulk of the theory and material you'd cover in a first semester compiler course is fairly static.
But if a person is merely looking to bang out a compiler without getting overwhelmed with how to convert NFAs to DFAs for lexing, etc., some good alternative books are:
A Retargetable C Compiler: Design and Implementation, by Hanson and Fraser (http://www.amazon.com/Retargetable-Compiler-Design-Implement...). This book constructs and documents the explains the code for a full C compiler with a recursive descent approach (no flex/lex or bison/yacc). I have some experience augmenting this compiler, so I can vouch for the book's ability to clearly convey their design.
Compiler Design in C, by Allen Holub (http://www.holub.com/software/compiler.design.in.c.html). Downloadable PDF at that link as well. A book from 1990 in which Holub constructs his own version of lex and yacc, and then builds a subset-C compiler which generates intermediate code.
Practical Compiler Construction, by Nils Holm (http://www.lulu.com/shop/nils-m-holm/practical-compiler-cons...). A recent book which documents the construction of a SubC (subset of C) compiler and generates x86 code on the back end.⬐ sklogic⬐ fahadkhanIt is actually an outdated view, to split a compiler into dedicated monolithic front-end and back-end parts. The more modern approach is very much the opposite. It is a nearly continuous, very long sequence of very simple transforms, rewriting a code seamlessly, all the way down from a front-end (i.e., a parser) to a back-end or multiple back-ends. And this approach is very alien to anything you'd find in the Dragon Book.
As for parsing, as I already said elsewhere in this thread, all the techniques from Dragon Book are not practical any more and are not used in the modern compilers. There are far better ways, which are not covered in the book, and they're far simpler, not even deserving a dedicated book at all.Please could you recommend an alternative?⬐ landmark2⬐ spooningtamarinModern Compiler Implementation in C⬐ fao_⬐ Ace17By A. Appel? I heard that the ML version is better since much of the code apparently isn't very idiomatic (It was supposedly translated directly from the ML book). I would probably recommend both that and The Dragon Book, as they cover roughly the same material but in a slightly different manner.Dick Grune - Modern Compiler Design⬐ xigencyEngineering a Compiler 2nd edition, by Keith D. Cooper and Linda Torczon
The dragon books are better used as references to those who are already familiar with all of the high-level concepts in compiler writing.I learned compilers this way and yes, it's extremely hard. Things got so high-level that I forgot what the real problem was and kept wondering why the algorithm I wrote isn't working.⬐ hitlin37Since technologies changes before the book is out, how does that hold in compiler domain? Are there any modern compiler book that explains around llvm/gcc as an example compiler?⬐ sklogic⬐ NoneNot sure if there are any books around LLVM or gcc, but there is a lot of up to date texts.
Grune at al., "Modern Compiler Design",
Appel, "Modern Compiler Implementation in ML",
and many more.
P.S. Because of a slow-ban, adding another one to this answer. This is the best source on SSA:⬐ sgeisenhI'll just add this on: https://www.cs.cmu.edu/~fp/courses/15411-f14/schedule.html
The lecture notes are simple and excellent, though at a very high level of abstraction. Though most of the notes are full of references to the original material, which is super nice.None⬐ barrkelIt's fairly heavy in lexing and parsing theory, especially around finite automata, pushdown automata, etc. It's the kind of book you might want to read if you're reimplementing yacc.
Modern code generation has moved on a bit, so I wouldn't dig too deeply into the latter third or so of the book.
All in all, for a hobby compiler, it would be a poor choice; heavy on unnecessary theory in the front end and outdated on on back end stuff.
Let's Build a Compiler by Jack Crenshaw is one of my favourite guides for the hobbyist, although it's a bit dated now since it used Turbo Pascal and targeted 16-bit x86.⬐ sklogicIt does not make any sense to reimplement yacc in the 21st century. There are far more powerful and yet simple parsing techniques, rendering all that automata stuff useless and outdated. Take a look at PEG, Pratt parsing and GLR.⬐ sparkiePfft. Who wants to parse in 2015? We all know that constructing your language as a JSON schema is the way forward. Your JSON parser is the only parser you'll ever need.⬐ ORioN63⬐ wtetznerOr a lisp.Or even better, GLL.⬐ sklogicIt's an interesting one, I'll add it to my collection.
Although, to be fair, PEG (Packrat) do support left recursion , and it can also be combined with Pratt for the binary expressions very efficiently, at O(n).⬐ wtetzner⬐ nickpsecurity> Although, to be fair, PEG (Packrat) do support left recursion 
Unfortunately it sometimes generates the wrong parse: http://tratt.net/laurie/research/pubs/papers/tratt__direct_l...⬐ sklogicAnd all the edge cases are related to the binary expressions - and you should be using Pratt for them instead.
Left recursive PEG must be reserved for things like method, structure field and array access, which are ok with that algorithm.⬐ wtetznerAh, gotcha. I struggled with PEG and left recursion for a while before finding GLL. Do you have any resources on how to combine PEG and Pratt?⬐ sklogicThere must have been some papers on it, but I lost a track. Will try to dig out something.
Meanwhile, you can take a look a my implementation of such a mix: https://github.com/combinatorylogic/mbase/tree/master/src/l/...
An example of a grammar built on top of it: https://github.com/combinatorylogic/clike/blob/master/clike/... - see the `binary` nodes there, they're translated into Pratt while all the others are Packrat.Best features of both types of parsers? Hell yeah! Never even heard of this one despite all the research updates I did a few years ago. Thanks for the link.Strong disagreement. The Dragon Book is super expensive, spends all its time discussing the easy part (parsing) as if it was complicated, then spends no time discussing modern algorithms in code generation.
Working from an example like LLVM or reading Appel or Wirth's books are better.⬐ isolate⬐ gnuvinceAlso Muchnick's book is good for the middle end.The Red Dragon Book is great (I like it better than the newer Purple edition), however for a gentler intro, I really like Crafting a Compiler by Fischer et al.: the first chapter does a great job of introducing the role of each part of a "traditional" compiler and chapter two implements a complete compiler for a tiny calculator language. The rest of the book follows the traditional progression (scanner, parser, type checking, IR generation, etc.) It's a bit expensive, which is my only reservation when recommending it for a university class (although students are resourceful if you know what I mean ;))
This will get you started on the compilers side of things.
You can also use http://lmgtfy.com/?q=sicp for a deeper understanding of interpreted languages and language structure.
⬐ code_chimpYou can usually find an International Edition of the dragon book on eBay for around 25 USD. The cover may be different, but the content is the same.⬐ sitkack⬐ sdegutisinternational editions are printed on horrible paper and often have mistakes for who knows what reason. I would steer clear of them for the most part.
The first edition can be had for $10 on amazon. http://www.amazon.com/gp/offer-listing/0201100886/ref=dp_olp...⬐ paulddraperThat's not my experience.
I usually find then to be cheaper, in paperback, and sometimes with different exercise problems.⬐ sitkackCool. I'll take another look. I got burned a couple times buying books on half dot com that were being resold from India. Quality was horrid.The dragon book is mentioned often, but usually only as "this book covers these things," which isn't really a recommendation, only a comment. Is the book actually good? Can you recommend it?⬐ archimedespiIt is good. I had a dated copy, but a lot of what it talked about was still applicable to learning how to implement a basic programming language.⬐ wbsunYes, it is a great book. Trust me. If you are scared by the lexing and parsing algorithms, skip them because it is highly unlikely you will write them on your own. You don't need to refer to other sources to implement your own language with it.
I try to work through hard technical material. Personally, I enjoy both technical books and MIT OCW lectures. There were a number of courses in school that I was interested in and didn't have time for, so I've been looking at the online equivalents.
"Working through" means doing exercises and projects. Reading or watching material without applying it doesn't help.
I aim to spend an hour a day on this. It doesn't always happen, but it's a reasonable enough goal that I can find time for it most days. Occasionally, I'll take a full day to study on the weekend.
Some specific recommendations:This is just what I've been interested in and is by no means comprehensive. Outside of CS, math is great to learn if you haven't studied it formally.
books: SICP (https://mitpress.mit.edu/sicp/) K&R The Art of Computer Programming (if you have lots of time) On Lisp (http://www.paulgraham.com/onlisp.html) Learn you a Haskell Types and Programming Languages CLRS The Dragon Book (Compilers, http://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811) OCW Courses: 6.172 (High performance engineering) 6.046 (Algorithms)
I've found it's best to pick a topic you know enough about to be motivated to study it, but haven't done serious work in.
⬐ nemesisrobotJust out of curiosity, how much of TAOCP have you worked through in terms of exercises? I'm currently working through Concrete Mathematics and while it's a very good book, I'm also finding it very challenging, so I'm a little hesitant to even think about working through TAOCP...⬐ karamazovI've only used TAOCP for reference. Working through it is on my to-do list, but I'm not sure when I'll get there.
The problems are ranked by difficulty, so you might be able to work your way up to the harder ones.
1> Do you have a "Real" CS degree?
If not, doing a good portion of the exercises in some books on [compilers](http://www.amazon.com/Compilers-Principles-Techniques-Tools-...), [DFAs/State Machines](http://www.amazon.com/Introduction-Theory-Computation-Michae...), Algorithms (http://www.amazon.com/Introduction-Algorithms-Thomas-H-Corme...) and theoretical programming (https://mitpress.mit.edu/sicp/full-text/book/book.html) can give you some common foundational lenses with which to see these articles
2> Learning the history of your field
Nothing informs the current state of the field more than how we got here! Learn the foundation of your field from people who lived it. The podcast [Debug](http://www.imore.com/debug) is Guy English (Creator of Napkin and other apps) along with Rene Ritchie interviewing people about the history of Cocoa and CocoaTouch
I found [this episode about AutoLayout and Key Ferry illuminating](http://www.imore.com/debug-33-ken-ferry-auto-layout-passbook...).
3> Go through early versions. Few systems START complex. Look at OLD books for EARLY versions of systems, and why old php made such silly choices is obvious (hint, they weren't that silly for what it was doing). Read books and commentary through the timeline. Understand the history of what's happening to the language, then you'll understand why you are where you are.
4> Go DOWN the stack. Understand the bottom of Objective C by reading [the open source implementation of Core Foundation and more](http://www.gnustep.org/). Also available elsewhere (and I think somewhere on Apple's Site still).
5> Do what you shouldn't! Don't ship it, but really use those implementation details to do something awesome and amazing. You'll learn tons about what's happening.
PS: To the mods, those aren't affiliate links
There is not much point in writing a parser without using a parser generator. Now, if you want to know how to make a parser generator, I recommend starting here:
If you want lots of detail, there is always this:
⬐ vidarhI couldn't disagree more. Parser generators are ok for trivial grammars with limited error handling. They're generally horrible for writing compilers.
(Given your mention of a career, I'm going to assume he is at the stage where he can write programs to do stuff, but wants to move to the next level.)
To get better at programming, you need to do more than learn to program (languages, semantics, frameworks, etc.): you need to learn to think link a programmer. In this, there aren't many shortcuts: it requires study and practice.
Here are some great subjects to look into to get him started:
- design patterns (http://en.wikipedia.org/wiki/Design_Patterns)
- functional programming (although this is language agnostic, I personally would suggest working through "Scala by example" http://www.scala-lang.org/sites/default/files/linuxsoft_arch...)
- meta-programming (here are some videos--with free samples--on metaprogramming in Ruby http://pragprog.com/screencasts/v-dtrubyom/the-ruby-object-m...)
- compilers ("the dragon book" http://www.amazon.com/dp/0321486811/?tag=stackoverfl08-20)
- testing (unit, functional, integration, etc.)
One thing that will probably help him advance a LOT is learning a language that does things completely differently than the one he's using. If he'd like to try that, this book looks good (haven't used it myself but the choice of languages is pretty good): http://pragprog.com/titles/btlang/seven-languages-in-seven-w...
Also, there are some great books on this list: http://stackoverflow.com/questions/1711/what-is-the-single-m...
I'm sure there's a lot more to be said on the subject, but that's a start off the top of my head. What he should start looking into really depends on his specific weaknesses and/or preferences.