HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Emulating a 6502 system in JavaScript • Matt Godbolt • GOTO 2016

GOTO Conferences · Youtube · 23 HN points · 3 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention GOTO Conferences's video "Emulating a 6502 system in JavaScript • Matt Godbolt • GOTO 2016".
Youtube Summary
This presentation was recorded at GOTO Chicago 2016. #gotocon #gotochgo
http://gotochgo.com

Matt Godbolt - Low-level Latency Geek, DRW

ABSTRACT
It's said you should never meet your heroes. They're wrong! This is the story of Matt meeting and getting to know one of his heroes: the 6502 microprocessor. It powered the Apple IIe [...]

Read the full abstract here:
http://gotocon.com/chicago-2016/presentation/Emulating%20a%206502%20system%20in%20Javascript

https://twitter.com/gotochgo
https://www.facebook.com/GOTOConference
http://gotocon.com
#JavaScript #JS #6502
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
This is one of those gotcha that trap senior devs way more than junior devs. Growing up with C being my first language, I intuitively associate switch with a jmp table (ie extremely fast). Even though I knew it’s unlikely to be as performant in JS, it never occurred to me that it would actually be slow, until I did a deep dive into low level JS optimization.

Related: Matt Godbolt’s talk on emulating 6502 in JS. He also mentions the poor performance of switch for opcodes: https://youtu.be/7WuRq-Wmw5o

dehrmann
In my first job out of school, I was writing driver code in C. In one of my projects, I had a code block that wasn't working, but all it did was a switch on ints. On a hunch, I tried ifs, and it worked. Turns out I found a bug in the in-house GCC fork. I'm sure my manager thought I was an idiot--I was a new grad saying my project isn't working, and I think it's a compiler bug.
georgecmu
What processor was that for? I remember (1999-2000) we found out the hard way that GCC for Hitachi H8s had that exact bug.
dehrmann
I think it was a generic mips CPU.
throwawaysleep
Did you stick to it? I admittedly would have conceded I was wrong in that situation.
dehrmann
I did! The compiler team agreed that it was broken and gave me a flag to disable jump tables. Surprisingly, that didn't cause any noticeable performance regressions.
carlmr
We always assume compilers to be infallible, and for the most part this is right. But special compilers for special targets or certified compilers for special uses, I've found to often have compiler issues. Why did you have a GCC-fork?
dehrmann
I wish I could remember. It wasn't a secret sauce thing, so all I can think of is you get improved support and faster bugfixes.
teruakohatu
> would actually be slow, until I did a deep dive into low level JS optimization

Can you explain why this is?

Intuitively it would seem like optimizing a switch statement would be really low-hanging fruit. I would have thought it would be optimized into something almost exactly like what the author hand rolled.

TAForObvReasons
You can switch on arbitrary types in JS and you can add expressions:

    switch(3) { 
      case true: break;
      case "ab" + "c": break;
      case Math.random() >= 0.5 ? "heads" : new Error("tails") : break;
    }
The optimizer will have to be persuaded that the switch and each case is integral before optimizing with a jump table. On the other hand, Arrays typically have a fast path for integer indices.
shultays
But in js you can have maps of arbitrary types as keys right? So jump table could just use that
mattashii
that's not quite true, because arbitrary expressions can change meaning due to e.g. overloading of functions.

E.g. Random.random() will have to be called each time that case branch is evaluated, because that function could be overloaded or change its returned value.

teruakohatu
That explains it, but shouldn't a switch with string or numerical constants (or an expression made up of constants) just be optimized away anyway?
jcfields
The article quotes a Stack Overflow post with the conditions under which V8 will optimize the switch statement:

> @LGB actually in V8 (JS engine used by google chrome) you need to jump through a lot of hoops to get switch case optimized: All the cases must be of same type. All the cases must either be string literals or 31-bit signed integer literals. And there must be less than 128 cases. And even after all those hoops, all you get is what you would have gotten with if-elses anyway (I.E. no jump tables or sth like that). True story.

In the case of this emulator, it wasn’t being optimized because it had more than 128 cases.

teruakohatu
> In the case of this emulator, it wasn’t being optimized because it had more than 128 cases.

Now that makes sense. Thank you.

syntheweave
I don't know. Maybe it's because I've spent too long thinking about such things, but interpreter dispatch routines are one of those areas of code where micro-optimizations are very pernicious and one should expect to review the landscape each time to rediscover best practice. See all the ways of compiling and executing Forth words for examples.
phkahler
>> I intuitively associate switch with a jmp table

Me too, but I try to keep the set of values dense and use a mask (op & 0xff) to help the compiler know it doesn't need to do bounds checking. Verify these changes help of course.

ChrisRR
I'm the same. I've always worked in C and would've naturally assumed it would've executed as a jump table.
fenomas
> the poor performance of switch for opcodes

I think you've slightly misunderstood TFA. The issue there wasn't that the switch statement was slow, per se - it was that V8's optimizing compiler didn't yet support such statements, so V8 was bailing out of its JIT step and executing the function via a (dog-slow) interpreter. In other words the function would have run painfully slow even if it returned before reaching the switch statement.

It's also maybe worth adding that gotchas like this were super common way back in the early days TFA describes, but AFAIK they are no longer a concern now. Back when V8's optimizer was primitive it used to bail out for all kinds of reasons - because it saw a "try()" statement, because a function was too long, or even just because it got confused. Nowadays it's a different world - I do a lot of JS perf work, but it's probably been two years since I've seen V8 deopt anything for any reason.

hinkley
> because a function was too long,

Famously, function inlining decisions were based on the size of the function, not the size of the AST of the function, which lead to situations where removing inline comments allowed a function to be faster.

madacol
What is TFA?
cleerline
The Freaking Article
dotancohen
The _Fine_ Article.
ruuda
V8 did not have an interpreter in 2013.
nickelpro
Of course it did, how do you think a JS engine runs code that it doesn't have any type information about without an interpreter?
Dylan16807
With a bunch of dynamic dispatch.

Interpreters and basic compilers are remarkably similar, and it's easy to transform between them if you do some extra busy work. Once you have your final AST, instead of outputting bytecode instructions, output the implementation of each instruction. Or vice versa.

You have code that handles different types in the interpreter? Great, bake it into the compiled code.

The transform from bytecode interpreter to compiler is always pretty simple, but the other way around gets more difficult the smarter the compiler is. So yes it is very possible to have a baseline javascript compiler and no interpreter.

nickelpro
I guess I'm differing on this point. By "baking in" the dynamic dispatch the final compiled code is, in my mind, an interpreter. Which is how I always thought of the "Full Codegen" step of the old v8 architecture.
Dylan16807
Well let's say you wrote C++ where most of your data is in a super union of float, string,*, bool, int, map*, etc, with gobs of repetitive dynamic dispatch whenever you touch one of those unions. That wouldn't mean your code is being interpreted, would it?
ruuda
I guess the lines can be a bit blurry especially when the compiled code still relies heavily on dynamic dispatch, but V8 generated machine code for the target CPU, marked the page as executable, and then it would call that just generated code. I think most people would call that a JIT-compiler.

It wasn’t a very advanced compiler in the sense that it generated very inefficient machine code. But it still ran circles around a tree-walking or bytecode interpreter. Over time the compiler became more advanced and multi-tiered, but at the cost of startup time, and mobile was taking off, where memory was more scarce than on desktop. x86 or ARM code is a lot bigger than a specialized bytecode that is optimized for space efficiency. So eventually the Ignition interpreter was added. But unlike some of the other Javascript engines that started as an interpreter and later added native code generation, V8 started out generating native code and only later added a bytecode interpreter.

ReadTheLicense
Could you please explain what do you mean by type information? I thought you never have type information in JavaScript... - except the base types (bool, number, string, object, array, function), but you always have this information...
fenomas
Basically, modern JS engines are fast because they do a lot of inferring about types. They do this both statically (by inspecting the code at compile time) and dynamically (by watching the code as it executes), and they do it for objects as well as literals.

For a hand-waving theoretical example, if you have code like:

    var a = { foo:1 }
    var b = { foo:2 }
    var c = { foo:3 }
    doSomething(a)
    doSomething(b)
    doSomething(c)
then modern JS engines can see that the "doSomething" function always gets called with arguments of the same type signature, and if it's a hot function they might optimize the function around that information.

If you want to know more, searching on "V8 hidden classes" should turn up relevant articles.

dchest
The bytecode interpreter appeared in 2016 https://v8.dev/blog/ignition-interpreter Before that, it compiled everything into machine code.
ncmncm
I think you also misread the article. It clearly said that an optimized switch would still run as a series of "if" tests, and that when he got the compiler to generate that code, it wasn't faster.
fenomas
TFA doesn't cover any cases where switch statements got optimized. It starts with the relevant code getting deopted, then the author tries a fix but the code is still deopted, then he removes the switch statements.

(Mind you I'm not saying switch statements are fast - they may well be slower than the alternatives even after being optimized. I'm just pointing out that TFA isn't about the performance of switch statements, it's about avoiding deopts.)

fps_doug
TFA references a post that states that optimized switch statements in V8 are still just if-elseif-else and not jump tables, at least back when it was written. They might be faster than unoptimized JS, but certainly not as fast as jump tables. The author didn't even bother to split up the functions/switch blocks further when they learned about the requirements to get V8 to actually optimize a switch, as it seemed not worth it.
whizzter
The if-else behaviour comes from the EcmaScript spec, check the sibling comment.
fenomas
Sure, none of that is in doubt. But in my experience the performance cost of hot code getting deopted is so massive as to overwhelm virtually any other factor, so realistically, fixing the deopt is what solved TFA's problem. Of course getting rid of the switch statements was presumably an additional speedup, but since TFA doesn't compare those cases I guess that's neither here nor there.
whizzter
The EcmaScript switch spec basically says that each case statement in turn is evaluated until one becomes equal or default is encountered, so to follow the spec the basic case is equal to a bunch of if-else's.

To optimize the runtime first has to figure out that all case values are constants (and insert guards to de-opt if it doesn't hold anymore), so in essence the machinery for running this code optimized has to be very conservative (and probably why they placed arbitrary limits that really aren't suited for emulators).

https://tc39.es/ecma262/multipage/ecmascript-language-statem...

Feb 20, 2020 · leoc on BBC Micro Bot
Here's a JSBeeb talk by Godbolt: https://www.youtube.com/watch?v=7WuRq-Wmw5o .
1 Martin Thompson busting myths about hardware and explaining why it's important to know. Mechanical sympathy makes you better, because you know how the code actually runs on the machine and interacts with different layers of memory

https://www.youtube.com/watch?v=MC1EKLQ2Wmg

2 Matt Godbolt (the man behind GCC explorer) - Emulating a 6502 system in Javascript

Great talk about BBC micro and much more

https://www.youtube.com/watch?v=7WuRq-Wmw5o

3 Matt Adereth - Clojure/typing

History of keyboards and a custom keyboard written in Clojure

https://www.youtube.com/watch?v=uk3A41U0iO4

I like the 3 for their content and how each speaker presented the background and their project/hack/ideas.

Highly recommend

voltagex_
Your second link reminded me of https://www.youtube.com/watch?v=nLv_INgaLq8 (which definitely builds on some of Matt's work).

https://www.youtube.com/watch?v=zBkNBP00wJE - C++17 for the Commodore 64.

Probable language warnings for my other suggestions:

* Rescuing Prince of Persia from the Sands of Time https://www.youtube.com/watch?v=FnEWBtCnFs8 (was this talk ever given elsewhere?)

* And You Shall Know Me By My Trail of Documentation - https://www.youtube.com/watch?v=fEgHdHdUDaA

* The History and Evolution of Computer Viruses - https://www.youtube.com/watch?v=s2g9lgYrYJM

Aug 22, 2016 · 2 points, 0 comments · submitted by tambourine_man
Aug 22, 2016 · 21 points, 2 comments · submitted by mattgodbolt
squeakynick
Thanks for bringing back so many fond memories of the Beeb. Does you mother know you're doing this?

Anyone else remember ILTDN and what it stands for?

mattgodbolt
Convenient clickable link to the emulation site itself: https://bbc.godbolt.org/
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.