HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Simple Code, High Performance

Molly Rocket · Youtube · 161 HN points · 6 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Molly Rocket's video "Simple Code, High Performance".
Youtube Summary
Kickstarter link: https://www.kickstarter.com/projects/annarettberg/meow-the-infinite-book-two

This was a presentation I gave to the University of Twente in early 2021. It's a case study of how simple, straightforward coding can turn several thousand lines of code and 10's of seconds of runtime into a few dozen lines of code and a sub-second runtime. It attempts to provide concrete examples of why software is often an order of magnitude (or more) slower than it should be.
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
The timing of this is great! I just watched Casey Muratori's lecture, "Simple Code, High Performance"[1] and its follow-up videos[2, 3]. I highly recommend watching it. It is about Software Optimization, and gives practical and high-level understanding of topics like CPU Architecture, SIMD etc. [1]https://www.youtube.com/watch?v=Ge3aKEmZcqY [2]https://www.youtube.com/watch?v=8VakkEFOiJc [3]https://www.youtube.com/watch?v=1tEqsQ55-8I
nickelpro
Casey is great but at times a bit... zealous? I distinctly remember him talking about how worthless having a file system is on a server, or for that matter an operating system.

His work stands at one extreme of how to architect a program, and is an extremely valuable reference, but he states his opinions as facts and that can be detrimental for the beginners he markets some of this material to.

suyjuris
> I distinctly remember him talking about how worthless having a file system is on a server, or for that matter an operating system.

If I recall correctly, his statement can be more charitably summarised as noting that many of the functions of an OS or FS are not useful in the specific use case he had, which was running a single program serving a website. He would therefore prefer to not have the additional complexity, attack surface and performance overhead. I do not think that this is very controversial? There are of course trade-offs involved, and he did mention that he was planning on running Linux as a way to get device drivers and something booting with reasonable effort, at least initially.

justin66
Filesystems being a worthless abstraction on, for example, a database server is the rationale for setting up a DBMS to use raw devices. This is not a popular approach and hasn't been for some time (I don't think any of the popular OSS databases have ever offered this approach, and AFAIK even Oracle is moving away from offering it) but it's not as though this is a radical idea.

edit: the Linux direct I/O feature that sort of obsoletes this use of raw devices was apparently developed in part by Oracle, which I guess might be interesting

https://lwn.net/Articles/348719/

sirwhinesalot
He's not wrong on the server end if your goal is maximum performance. That's the whole point of unikernels and they can lead to some impressive speedups. More so when you consider how many servers are running single apps inside docker containers in some Linux VM running on a hypervisor which is usually Xen.

Massive inefficiencies that could all be cut down to a unikernel running on the hypervisor directly. Whether that's a good idea or not in practice is a different story, but the performance improvements and energy savings are real.

That said, his issue is less on the strength of his convictions (which, as far as performance goes, are at least on-point), but rather his delivery. He has very little patience and is not looking to debate even when it might be beneficial to himself or to others.

nickelpro
I don't at all mean to say he is wrong about performance. Casey is brilliant, his approach to writing software is optimal or near optimal when viewed through the lens of performance.

And yet this is not the only lens through which to view software development. Correctness, expressiveness, portability, developer hours. There are many possible things to optimize for, and Casey barely gives these an ounce of weight compared to performance. The reason to target a program towards a platform with an operating system and a file system is because writing these things takes time, using them affords flexibility, and minimizes bugs from you're bespoke code. Casey does not view these as problems.

Casey writes near-C for bare metal, or when not using bare metal directly against the platform API and if you only learned from him that's the only approach you would view as valid. When talking about his approach to writing code he constantly derides, sometimes very harshly, those who would think about program architecture any other way.

I don't think the Rustacean evangelists' obsession with memory safety is the last word is software development, or Linus's assessment of languages other than C, or Jonathon Blow's and Casey's orientation with performance above all else. These are ideas, opinions, and there's some value in representing them as such.

sirwhinesalot
100% agreed. I think when it comes to performance, the biggest takeaway from Casey/Jonathan is to just have the computer do less work. If you can get away with just poking at a big buffer then that's what you should do. It keeps the code both simple and efficient, no need for 30 layers of abstractions.

But beyond that, IMO, it's fine to use higher-level constructs and target higher-level "platforms" (i.e. electron) to improve portability, developer productivity, etc. Even within that setting you can avoid doing things that pessimize your performance, and that's often enough for a massive improvement.

Dogma is the bane of software-development, be it abstraction at all cost, immutability at all cost, or performance at all cost, etc.

webmaven
> Dogma is the bane of software-development, be it abstraction at all cost, immutability at all cost, or performance at all cost, etc.

You seem very sure of that.

sirwhinesalot
I do not claim to be enlightened, I am as flawed as my peers ;)
Blue noise is useful in games for generating more natural looking scenes. Here's an example of how to algorithmically place grass in the game "The Witness".

https://youtu.be/Ge3aKEmZcqY?t=965

So if you use white noise, it looks unnaturally clustered with patches. This is the part that discusses the noise.

https://youtu.be/Ge3aKEmZcqY?t=1385

For one real application of blue noise in particular, see https://youtu.be/Ge3aKEmZcqY?t=1350 (22:30 is a good starting point). Here Casy explains a grass planting algorithm for games, where white noise wouldn't be suitable.
iamwil
Haha, I guess you and I just saw the same video.
Nov 07, 2021 · DrBazza on An Epic future for SPJ
> Speak for yourself; most phones I see still take 2+ seconds to open some apps.

Why is that though? I've written code since the 8 bit days of BBC micros and ZX Spectrums.

Casey Muratori's (who has been on here a lot lately) talk nails it [1] - developers just don't realise how fast their code could and should run. And the fact that we have layers upon layers of frameworks and no one cares.

[1] https://www.youtube.com/watch?v=Ge3aKEmZcqY&t=8638s

There are no tricks, you just have to give up mountains of levels of abstraction accumulated over the years.

https://www.youtube.com/watch?v=Ge3aKEmZcqY

Oct 30, 2021 · 141 points, 82 comments · submitted by archagon
Zababa
I wonder how much performance is gained by having the code written by just one person. At work we have a reasonably large (16 MLOC) codebase. At this point I don't think it's possible for one person to do everything, especially while adding new features. So what we do is that we create relatively clean interfaces, and have people code mostly inside those interfaces. But every time we go through one of those interfaces, there's a performance hit.
kjksf
In this case: nothing.

The fast renderer was written by one person in a few days. It's clearly a "few thousand loc" codebase, not "millions loc".

The 100x slower renderer was also written by one person (or at most few).

It really is the difference between programmers.

It's also a good example that 10x programmers do exist, at least in certain situations.

The Microsoft programmers are not dumb. They certainly are not cheap.

And yet here we have one guy who can write code that is 10-100x faster, in just few days.

Zababa
I don't think that's exactly true. It's not the same thing to write a piece of code as part of a whole system, vs rewriting that piece of code as an isolated part.

> The 100x slower renderer was also written by one person (or at most few).

The difference between one person and a few is massive in my experience.

> It really is the difference between programmers.

> It's also a good example that 10x programmers do exist, at least in certain situations.

> The Microsoft programmers are not dumb. They certainly are not cheap.

> And yet here we have one guy who can write code that is 10-100x faster, in just few days.

I think it's a difference between incentives too. Microsoft programmers are not here to optimize for performance, but to push features, performance being one of them in some cases. For example, the research in file explorer is still painfully slow, probably because they have some other things to do. Many people could make it faster in a few days I guess. But these people aren't inside Microsoft being pressured to do other things.

Software made to demonstrate how fast something can be is made in a very different environement and with very different incentives compared to almost all comercial software. For pretty much anything out there, someone can take it, isolate it and make it run 10-100x times faster I think. The thing is, in most organizations you don't have that kind of time.

ratww
> It's not the same thing to write a piece of code as part of a whole system, vs rewriting that piece of code as an isolated part.

But, AFAIK, the new Terminal was new. Sure, it's 3 years old already, but Microsoft still didn't have the hypothetical "this is part of an old system" constraint when they started writing it.

> Microsoft programmers are not here to optimize for performance, but to push features, performance being one of them in some cases

Microsoft's marketing copy for the new Terminal cites "fast, efficient" in the first paragraph. It also cites GPU optimisation. Clearly performance was a goal.

Also, as been discussed ad-nauseam in other HN threads, Casey also added some extra features that didn't exist in MS's New Terminal. The "busy doing features" excuse also doesn't apply.

Attummm
Looks like you dont now the full story behind it. But there is quite a back story to this.

The guy in video called what your doing now "the excuse parade".

Instead of finding solutions, your time and energy goes into making excuses.

Watch the full video.

Zababa
> The guy in video called what your doing now "the excuse parade".

> Instead of finding solutions, your time and energy goes into making excuses.

Excuses for what exactly? I don't remember writing that code. I'm explaining incentives that leads to slow code. I usually try to write the best code I can in the time I have, but if one day the problem is that my code is too slow, I won't be making excuses. I chose at the time to not spend very much time on performance to ship faster, and that was a conscious choice on my part.

ratww
The "excuse parade" mentioned by the grandparent poster is another name for what psychology calls "rationalisation".

Microsoft fucked up about performance, period. The reason for that is not because the Terminal provided too many features, or because they wanted the code to be "readable". The manager didn't even claim the team didn't have time to do it, he flat out said it would be impossible without taking a few years to research.

Time and time again we see this kinda thing happening in software and people jump into made-up rationalisations because they're in denial about the root cause of issues.

Zababa
I think there's a difference between most software being slow because it's not a priority, and some specific parts like that terminal being slow because people have absolutely no clue and are in denial.
ratww
But the fact that higher performance is not a priority is something that is held together by those excuses and rationalisations.

People complain about performance all the time, both users and people inside their teams, but the thinking you're espousing is so widespread that developers clamouring for optimisations are just shut down.

End-users not having to wait 5 or 10 seconds unnecessarily can be a boon in productivity in some industries that use Enterprise Software. But we can't rely on having Casey Muratori going to a supermarket whose POS is slow and rewriting the software on a weekend and publicly shaming the supermarket chain on Twitter. Change has to come from within. Even accepting reports that the software is slow would be a good start.

The problem is not that the inefficiency exists or that they can take too long to fix. The problem is that no team ever really cares to stop and say "ok is this inefficient, how long will it take to optimise? Will we get gains from that?".

Instead of asking and researching, people just do as you do: they rationalise by saying "I can't prove that it can't be faster and I can't prove that it's not costing us or our users money, but it is my belief and no proof will make me change it".

This is a industry wide problem. It's anti-scientific posturing that's leading to widespread software slowness, programmed obsolescence and excessive spending of hardware.

Zababa
> Instead of asking and researching, people just do as you do: they rationalise by saying "I can't prove that it can't be faster and I can't prove that it's not costing us or our users money, but it is my belief and no proof will make me change it".

How exactly do you get that from what I said? That's a strawman of my position. I've been really clear about it multiple times: our users are asking most of the time for new features more than they are asking us for better performance. That's it. That's the beginning and the end of the issue at my job. Sometimes, we have a specific things that's too slow for users, and they will ask us to make it faster, and we will make it faster. Most of the time, they ask us for new features. We add them, while trying to make code that's reasonably fast, maintanable, understandable, localised, and that does what the users wants.

> End-users not having to wait 5 or 10 seconds unnecessarily can be a boon in productivity in some industries that use Enterprise Software. But we can't rely on having Casey Muratori going to a supermarket whose POS is slow and rewriting the software on a weekend and publicly shaming the supermarket chain on Twitter. Change has to come from within. Even accepting reports that the software is slow would be a good start.

You seem to think that I'm working at Microsoft and am one of the people that told Casey that he was wrong. I'm not. My point is that most people are like me, try to do things well but have to balance many incentives. And a few people, like the ones Casey interacted with, are just plain wrong. But these people aren't the majority and aren't the sole reason software is slow.

I've said it before and I'll say it again: having to wait 5-10 seconds is tolerated because exporting the data and doing the thing in Excel yourself would be slower. Imagine a task that takes 5 minutes manually, 10 seconds unoptimized and 1 microsecond optimized. It's a shame that in most of these cases the software will take 10 seconds. But it's still a huge boost compared to the 5 minutes of doing it manually. Even if the software is not the best, it still has a huge value. My point is that most of the time, at least in my industry, users will want more tasks going from 5 minutes to 10 seconds, than tasks going from 10 seconds to 1 microsecond.

Now, if one day our users want a task to go from 10 seconds to 1 microsecond, or even 1 second, and we tell them that it's impossible and would require a PhD, we are wrong, period. That would be an expression of our limitations as programmers. I completly agree with you on that point. However if we tell them "We would love to do this, but you're a minority wanting that, so it wouldn't make business sense to do that", I think we are honest and doing our jobs. There's a chance that we are wrong, and that focusing on that would bring way more business value than we think. And if that happens, we should increase the business value of other cases like that. But outisde of that, I think we operate rationally and in good faith.

> But the fact that higher performance is not a priority is something that is held together by those excuses and rationalisations.

> People complain about performance all the time, both users and people inside their teams, but the thinking you're espousing is so widespread that developers clamouring for optimisations are just shut down.

They do, but they ask for new features even more. Honestly, I would love my job more if most of my work was optimisation instead of new features. I'm not in love with the software we develop, and I find performance work more interesting. But our software saves a lot of time to our users, and continue to do so with new features, so we develop new features. I try to be sensitive to performance, stuff like avoiding O(n²) for an array intersection, avoiding making three times the same loop in a row. But I don't have all the time I want to dedicate to that. I also don't have control over everything, some parts are held by other teams which are a bit territorial, and since their stuff isn't well tested, it's hard to go in and make changes.

All of that to say that you seem to see evil everywhere by focusing on a few bad examples, while most people are actually trying to do not terrible software, but don't really have the time to do so.

ratww
Nobody here is saying you're personally responsible for slow software, or that you're personally pushing for software to be slow. But I'm saying that rationalisations about slow software are partially to blame. Preconceived assumptions keep being proven as incorrect, but developers keep repeating them.

I also never assumed anywhere you work at Microsoft or that you blamed Casey for anything, I'm just using his case as an example. The gist of that sentence is that software can't be made faster if the only way to get that to happen is via public shaming, like Casey did.

About having to wait for users asking for speed increases: I feel like this is a bit of a strawman in itself, because even when users complain, it's rare that product managers follow trough.

But even discounting that: users not asking doesn't mean that software can't be better made faster, that it won't be a competitive advantage, or that users even know it's possible.

Both features and optimisations should not be driven by user votes or managers hunch. Teams should see the data, analyse usage, the market, predict outcomes, predict how difficult. And implementation should be iterative. Software made by continuously slapping spaghetti against the wall is the #1 cause of teams too busy.

Also, about the cost of optimisation: it's (more often than not) nowhere near as big as people assume. It was demonstrated by Casey and in other cases. And no, it's also not something only super programmers can do.

Zababa
> But I'm saying that rationalisations about slow software are partially to blame. Preconceived assumptions keep being proven as incorrect, but developers keep repeating them.

As I said, I think that it's important to show people how fast software can be. Once it's done, either you agree that your software could be faster, or you're acting in bad faith. I think we both agree on that point.

> About having to wait for users asking for speed increases: I feel like this is a bit of a strawman in itself, because even when users complain, it's rare that product managers follow trough.

That's how it work at my company. I can't really say about how everyone else works, but it would make sense to act this way.

> But even discounting that: users not asking doesn't mean that software can't be better made faster, that it won't be a competitive advantage, or that users even know it's possible.

> Both features and optimisations should not be driven by user votes or managers hunch. Teams should see the data, analyse usage, the market, predict outcomes, predict how difficult. And implementation should be iterative. Software made by continuously slapping spaghetti against the wall is the #1 cause of teams too busy.

I mean, sure, but most people aren't at the level where they can decide everything they do. Most developers follow what managers/product owners tell them to do. Asking the developers to say no to the managers and work on something else is a bit easy to do, and will lead to zero consequences in the real world because that's not something people can do.

> Also, about the cost of optimisation: it's (more often than not) nowhere near as big as people assume. It was demonstrated by Casey and in other cases.

It was demonstrated by Casey in one specific case. I assume that most software is more complex than that. At work we have lots of moving part and not one specific hot path, which makes it hard to do a big optimisation like the one he did. It's not like we have one process taking 80% of the CPU. Again, that doesn't mean that it's impossible. It would just take time. Time that we can't really take.

> And no, it's also not something only super programmers can do.

I never said that. I think everybody can optimize code. You need some basic knowledge like "use a profiler instead of only relying on intuition", "use good metrics", and things like "I could use another hash here" or "SIMD would make sense there" or "My problem seem to be an union find-type problem. Where can I find an optimal algorithm for it?" but then the final limiter seems to be time.

> The gist of that sentence is that software can't be made faster if the only way to get that to happen is via public shaming, like Casey did.

Maybe public shaming isn't the right word, but "making noise" is important. If our clients don't ask for speed, we'll continue delivering features that other clients ask for (as long as it's reasonable). I think we should push back against people like the ones that told Casey you need a PhD for this. But we should also push back against bad incentives. Maybe the person said that a PhD was needed because there is no place for failure, ignorance or errors in their team, and saying that you need a PhD is the only "escape" if that makes sense. I don't think I'm a bad programmer, but in almost all domains, there are people that know a lot more than me. An important part of programming is to have the humility that some other people will do way better than you. But often people from big companies seem to react as if they're gambling with their job if they admit that someone else did better. Being able to accept a better solution is also an important skill for a developer, I think. In that case we should ask for the developers to do better, and shame the companies that don't let their developers accept better solutions from outside.

ratww
> "Asking the developers to say no to the managers"

I never said we should. "Teams" includes everyone. And engineers do have cachet to push for things.

> "I never said that"

I never said or implied you did, it was a general statement.

> "It would just take time. Time that we can't really take". "If our clients don't ask for speed"

Here's what I've been saying: not taking time to estimate how much time it would really take (or even if it's needed) is nowhere near as bad as saying that "a PhD is needed for that", but it is bad. Also, acting reactively is also bad. Maybe your software is already good enough, but not knowing is also an issue.

kjksf
It's funny because Casey made this video partly to push back on people like you who come up with limitless excuses for why slow code is "actually the right way to do it".

I guess it didn't work.

Are you really arguing that the code is 10-100x slower because it was written by 2 people and not one?

And if the reason we write slow code is to save programmer time, as your other argument goes, why is it ok to put 2 people to write code that can be written by 1 guy?

That excuse doesn't make it better, it just makes it look doubly bad for Microsoft. Not only they can't write fast code, they spend twice as much time doing so!

Zababa
> It's funny because Casey made this video partly to push back on people like you who come up with limitless excuses for why slow code is "actually the right way to do it".

I'm not saying it's "actually the right way to do it", I'm saying that it's how it's done in real life. That kind of argument reminds me a lot of Robert Martin that is always saying that the fix to software is to have everyone be more disciplined. It's true but it's also absolutely useless. It's like saying "people should be less mean to each other". Yes, they should, we've been saying that for thousands of years. Does that come with anything? A new tool to show how fast your code can be? It's nice to show to people how fast the code can go, especially since the students didn't seem to think that it was possible. But then what?

> And if the reason we write slow code is to save programmer time, as your other argument goes, why is it ok to put 2 people to write code that can be written by 1 guy?

You measure time in man hours, not just in men.

anandoza
3 hour video with no timestamps, a minimal description, and comments turned off?
tigerwash
Yeah, thought so too.

At least adding some major timestamps in the description would be great.

Attummm
I think many saw it already. He is great a programmer, his analysis on the initial problem and on his situation with Microsoft is on point.
devnull3
This was good. I like the presentation.

---------

This can be a good interview question for Google to screw people over.

I was asked to write a code to generate maze (100x100) with the constraints that it should neither be easy not it should be hard. This was for L6 position.

omegalulw
That's a very interesting. I am curious to hear what your answer was :D

Here's my thoughts on what a solution would need:

1. At least one path from source to target.

2. Some quantification of hardness. No of steps in optimal path? Number of turns in optimal path? Number of paths? Some weighted combination of all three?

Some preliminaries:

Finding optimal path, and finding number of paths can both be down in O(N^2) using DP. Finding number of turns is then trivial.

Now the Algos for 1:

Algo for 1: Naive backtracking, i.e., randomly generate paths until there are no paths. Evaluate each maze using the heuristic and output best one. Run for some fixed time t. This is exp time.

Another algo for 1. Generate a path as following: select K points on the grid, with the start and end being the first and last; then finding optimal paths in sequence (notice that this guarantees not cyclical path). Next, generate fresh path on an empty grid and overlay on the previous path. Keep repeating for some M paths. Now fill in non path pixels. Pick the best grid among these M steps. This is O(M*N^2).

Last Algo for 1: Throw the grid into an ILP solver and optimize the heuristic (exp time).

devnull3
Well, I addressed the "neither easy nor hard part" upfront. My take was there has to be a solution and that too 1 solution. In other words, it had to be fair.

Second was the degree of false paths (term I made-up). This essentially governs the branching of each false path. Higher the degree higher the branching and higher the backtracking. This would make the maze harder.

So my algo was

1. To generate a valid path from top-left to bottom-right. (this is simple bfs/dfs walk). This is illustrated as path 1-2-3-4-5-6-7..-10 below. This ensures we have a fair maze.

2. Now from each number below, generate path in an outward manner till it hits walls. These are false paths. The "degree" mentioned above will dictate if there are further branching out of these paths.

1 * * * * *

2 3 * * * *

* 4 5 6 * *

* * * 7 * *

* * * 8 9 10

In the step #1 we note the row,col in a dict/hashmap. We use these in step #2 to ensure the dfs walk dont step on these row,col.

This is all I could conjure-up in 45 min including a code in python. I was labelled lean-no hire.

Edit: fixed the rendering of the maze

jorangreef
Thanks for sharing the challenge and so cool to see the follow up and your actual answer.

Here's a linear solution I came up with, before I took a look at the rest of the thread:

1. Think of the 100x100 as pixels on a black background, all set to 0, i.e. all open space.

2. Now, draw a white square border all the way round by setting all the outer pixels to 1. No one can get in.

3. Leave a pixel gap and draw another square border within and then another and so on, like Russian dolls, with a pixel passage way between them. No one can get in.

4. Now, for each square border choose 1 random pixel and open it up by setting it back to 0. Now we're guaranteed of a solution, but it's too easy.

5. Let's make it a little harder, so between each square border let's drop a single pixel of "rubble" to block each passage way at one point. Provided we don't drop it directly in front of an opening in the adjacent square borders, I believe (unless I made a mistake somewhere!) we know the maze remains solve-able, and we don't need to do any iteration or "walk through the maze" to check that.

6. So far the runtime is pretty good. Nice and linear in the number of pixels drawn. We know the maze can be solved. And it's not too easy and not too hard.

7. (optional) We can make it harder still by tentatively dropping another pixel of rubble in a random passage way, and then walking through the passage to check that we can still reach the next inner opening. This is still better runtime than solving the whole maze, and can be tuned by the difficulty factor.

The insight is simply not to attempt to explore paths at all, i.e. not to try and "solve the maze" but only to "generate the maze", unless 7 is chosen, but that's probably not essential to the challenge.

michaelmcmillan
You would risk ending up with a maze that can be solved by walking in a straight line directly to from start to exit.
jorangreef
Ah, good catch, thanks!

I guess this could either be fixed up later with a simple check at the end, also linear, just in case it ever happens, or even just a condition whilst placing gates, that they can't be directly opposite the last gate placed but must be some x/y distance away.

Even without any check though, with a 100x100 grid, this is highly unlikely to happen often, as all 50 or so square borders would need to have their gate on the same side of the square, and at exactly the same position.

That's a few probabilities that would all need to intersect, i.e. I believe somewhere on the order of 1 in Math.pow(1/(100*4), 50) unless my probability theory is way off.

jgwil2
So does he have a special reversed image printed on his shirt just for this lecture format?
isaacimagine
I looked into this a while back, yep.
superjan
So this guy has had a t-shirt printed with a mirrored logo just to be able to use it with the transparent whiteboard.
archagon
I really like this lecture format. What is he drawing on?
gary_0
I guess he filmed through a pane of glass, and then flipped the video? It's trippy that he's physically writing backwards, but the text we see isn't reversed.
archagon
We see the whiteboard text scrolling sometimes, though.
_hao
He clicked something below to trigger that. I think the pane he's using actually got moved, so potentially he has a couple lined up?
archagon
Hmm. The movement is definitely mechanical, and it definitely sounds like he's writing on a hard surface.
throwuxiytayq
I'd like to point out that this also means that his t-shirt has the logo pre-flipped. What an absolute madlad.
generichuman
He uses a lightboard [0] and flips the video.

[0] https://www.lightboard.info/

vashishthak
That's exactly how i managed to enhance the performance of my site. https://vashishthakapoor.com/ Keeping the features along with performance is a big challenge. Video is really explainatory to fix performance issues.
VHRanger
Wow, your site is indeed fast. Good job!
vashishthak
Thank you so much mate. Means a lot. :)
ratww
+1 here. It's very fast, but it also looks quite good.
None
None
aetherspawn
"Simple code" -> very difficult to decipher use of SIMD intrinsics. Yikes.

He may have deleted 1000 lines of code, but he'll need a 2 hour inline video (with a 1000 line transcript) to explain how it works.

werner1886
He measures code 'simplicity' by how much work it makes CPU do, and not some made up metric like 'readability'.
philosopher1234
How exactly is readability made up?
RedShift1
Simple example: I love the ternary operator and use it quite a lot in simple "if/then" scenarios. However some people hate them because they consider them harder to read than the fully written out if/then form. Those people would judge my code less readable.
dsego
Ternary operators are better for expressions, which yield a result, since you don't have to declare or initialize a variable with a dummy value first. If/else is better for general branching. Using ternary without assigning or passing the expression would imho be misleading. I've seen ternaries used for branching and it's a smell imho, they're not really meant for that use case.
flohofwoe
Code readability is at least extremely subjective, one person's highly readable code is another person's incomprehensible mess.
generichuman
It is not made up but different people can have different opinions on what's readable and what's not.

If the metric is supposed to be objective, then number of CPU cycles used is probably the simplest metric there is for computers.

meheleventyone
Right it’s a communication problem because calling code simple conjures different ideas in people heads. Code that is simple for computers (principle of least work) is not necessarily simple (principle of comprehension?) for humans.
flohofwoe
There is a high overlap though, it's often harder to understand how highly abstracted code actually works than 'unrolled' verbose code composed from simple operations (which is closer to machine code - thus the 'overlap').

It might be easier to understand the 'intent' of highly abstracted code, but this doesn't mean the code behaves as intended, and IMHO 'readability' is about understanding what the code actually does, not what it is supposed to do.

meheleventyone
I think all these things are aspects that are interrelated. A principle of abstraction is another good one. Others I can think of are principle of least surprise and principle of least work done by the compiler (lol “zero cost” abstractions).
aetherspawn
It's a measurable metric in the sense of: give a programmer a ticket that says "make the trees spawn in veins not clusters", and however long it takes the programmer to understand what they need to do, is the readability of the code.

Now if it takes them 3 days to understand what all this SIMD stuff does, but it takes someone an hour to understand the code in its previous form, go figure.

werner1886
We might be thinking about different things here, so let me first ask this: What do you want to measure?
bruce343434
How can you objectively quantify it?
matiasmolinari
This is a good question to ask, if not rhetorical.

When practicing rigorous measuring, quantities need to be quantifiable aspects of the world. For example, you can quantify how much physical space your code or compiled output takes up in memory, and use these quantities as base units to derive others. By branching from a quantifiable root, you can derive metrics such as lines of code or number of CPU instructions, and they’d still retain those quantifiable aspects. Meaning there’s a clear path to the quantifiable root.

Needless to say, readability, as a metric, is not branched from quantifiable aspects of the world. So in a sense, it is still a “made up” metric because (as of today), there’s no way to trace it down to the quantifiable measurements.

meheleventyone
I’d note that because something is hard to measure doesn’t make it “made-up” or unimportant. And that concentrating on things that are easy to measure doesn’t make them more important and in fact can bias things badly.
dsego
I think the poster is using the term “made-up” in a stricter sense to categorize it, not just to be dismissive.
MikeDelta
Readability can be quantified and it done by static code analyzers into a metric known as cognitive complexity [0], which measures things like amount of branches in your function. The thing is, you can make code of low complexity but still hard to read.

[0] https://tomasvotruba.com/blog/2018/05/21/is-your-code-readab...

matiasmolinari
Sure. You can trace branches to quantifiable roots, but you can’t trace readability. To me, readability means something different than what it means to the author of that article.
ratww
Cyclomatic complexity is a good metric for local readability, but for considering the readability and ease of modification of whole programs or even for single files, it's far from good enough.

For an extreme example: you can turn all the methods of a complex program into one-liners, and all your classes into one-method classes. But doing that will definitely make your program harder to read.

Zababa
> Needless to say, readability, as a metric, is not branched from quantifiable aspects of the world. So in a sense, it is still a “made up” metric because (as of today), there’s no way to trace it down to the quantifiable measurements.

It is, actually. "Readability" for me means "how long it takes to someone that didn't write the code to be able to understand it, make changes, add features". It's a more fuzzy metric of course, as anything involving humans is, but that's also usually the kind of metrics that matters a lot.

That also means that it's not a binary readable/non-readable thing. A way of measuring "readability" could be: assuming all other variables are equal, what percentage of the new hires are able to add new features after 1 month?

dralley
Even if he had stopped before hand-rolling the SIMD, it'd still be a multiple order of magnitude improvement.

I do think it would have been useful to demonstrate that before going straight to hard mode.

cryo
I love these videos, it's the kind of no BS programming to press performance on modern computers. Also recommend the Handmade Hero series of him.

We have computers which are ridiculous fast, but tend to write code which is freaking slow. It's sad that most programs could be 100x faster.

nuerow
> We have computers which are ridiculous fast, but tend to write code which is freaking slow. It's sad that most programs could be 100x faster.

I feel this sort of comment misses the whole point of going with code which is patently slower than alternatives.

The main reason is that performance is a constraint but not a goal, and once it is good enough then there is absolutely nothing to be gained by wasting time on seeking performance gains.

Meanwhile, the main resource in software development is man*hours. The faster you write software (add features, fix bugs, etc) the cheaper it is. Thus, software projects benefit the most by adopting and using technology which favour turnaround time, which means higher-level languages, generic full-featured frameworks, and code reuse. They are slow and bloated and not optimized, and they are used without optimization in mind. But they work and they work acceptably.

Your client and project manager does not care if you go with a O(n2) implementation that you can whip out right now by reusing a package and has acceptable performance even if there is a O(n) alternative that requires you a few weeks to implement and forces you to test, debug, and maintain a lower level implementation.

Performance is meaningless once it's good enough. It matters nothing if your browser wastes 1GB of RAM even though it could just use 100MB because practically everyone already has 8GB to begin with, and if would be foolish to waste resources prioritizing that if no one is willing to pay for that improvement. It matters nothing if your server has a relatively low throughput because it is wasting 10s of milliseconds doing nothing in each response if all your servers barely break 50% utilization. It matters nothing if your frontend is wasting 10s of milliseconds rendering a page if end users don't even notice any delay. If performance is good enough, work is focused on where it matters.

We know it's possible to get formula1-level performance by using formula1-level of engineering and maintenance, but the world runs on Volkswagen hatchbacks.

SkeuomorphicBee
This has been the prevailing mentality in the industry for a long time, being essentially a business dogma in IT since the 90s (for a good reasons, as you explained). I think Java is the personification of this concept (more specifically idiomatic corporate Java from the 00s).

But there is a recent small change of winds, where management is realising that being faster than the competition can have some business merit, being worth spending some man*hours. The lecturer explains it well in the beginning of the lecture. It is not a 180° turn, performance is not the priority (as it shouldn't be), but a relevance "differentiator". That is one of the drivers of the recent growth in compiled languages (Rust, Go, ...), Not the only one but it helped.

nuerow
> That is one of the drivers of the recent growth in compiled languages (Rust, Go, ...), Not the only one but it helped.

No, not really. In either case (rust, Go) perfomance is at best a nice-to-have, while their main value proposition is, and has always been, turnaround time and consequently man*hours. Rust is marketed primarily due to their first-class support for safe programming constructs, and providing a far better developer experience over C and C++ at the expense of a small but negligible performance impact. Go is marketed primarily for it's first-class support for concurrency, and provide a far better developer experience than Java, C#, and C++ with little to no performance gains at all.

What matters is how long it takes developers to add value. That's it. Most of the times there is simply no value to be gained by shaving a megabyte or millisecond here or there, but undoubtedly there is value in shipping features, not having downtime, and eliminating bugs.

Chris_Newton
It seems that your argument throughout this discussion is based on two assumptions.

(1) Software already has acceptable performance.

(2) Further work to improve its performance is likely to have large development costs but deliver only small benefits.

I’m not sure either of those assumptions is safe. As the early parts of the lecture we’re discussing today demonstrated, sometimes there are dramatic performance improvements that can be made if you know what you’re doing and they can make a similarly dramatic difference to how beneficial the software is to its users.

robalni
> Performance is meaningless once it's good enough.

There is no level of performance that is "good enough". One person's "good enough" is another person's "pain to use" and it's someone else's "I can't buy that computer that I would like because it will not run this software well". Faster software means more options for the consumer, less energy usage and it will be easier to make computers because they don't have to be crazy fast.

I have a computer that is not one of the fastest in the world. I love this computer. It's tiny, beautiful and silent (no fans). When using this computer I can clearly see how slow a lot of software is. It's not the computer that is slow because it's definitely possible to write software that runs really fast on this computer.

One thing that programmers can do to make the situation better is to underclock your CPU when testing your software. On Linux this is very easy; just write the value "800000" to the files that match "/sys/bus/cpu/devices/cpu*/cpufreq/scaling_max_freq". Now your CPU will run no faster than 800Mhz. This will help you to notice when your program gets slow much earlier. And if you then think the performance is "good enough", it's very likely that your users will think so too because most people have a CPU that runs faster than 800MHz.

Zababa
That's only considering part of the equation. We use software because it does things faster or more correctly than us. A slow and bloated spreadsheet software will still be usually way faster than manual calulation, and more correct. That way, you can understand why some people would prefer more features to faster software. The new feature allows people to do some things way faster than before. Because when a software can't do something, people still do it. The classic is exporting data in spreadsheets and doing stuff with it. That happens all the time in big organizations. And it's usually slower and less correct than having a way directly in the software to do that.

So there is, in fact, a performance level that's good enough: faster than doing it manually, or as fast/slower but more correct. That's the point at which software becomes useful.

robalni
> So there is, in fact, a performance level that's good enough: faster than doing it manually, or as fast/slower but more correct.

I would not call a 30 minutes website load time good enough, even if it takes 31 minutes to travel to visit the company's office physically.

That's because I know that it could be much faster. Traveling for 31 minutes is acceptable because maybe it could not be done much faster.

Zababa
That's a fair point of view, but I think it's wrong. Travelling might feel way better, because you're doing something, and not waiting, but still, it's slower.
bsenftner
All this sounds good and practical, until an organization operating in your market optimizes their technology requirements. Suddenly that organization has both exponentially faster technology support for everything they do, and their expense in doing so is exponentially less than your organization. Optimized technology is not "good enough", it changes the nature of the conflict to revenues.
andrepd
> once it is good enough then there is absolutely nothing to be gained by wasting time on seeking performance gains.

Energy consumption, loading time, performance for anyone who does not have the latest computer/phone + fast and reliable internet.

In fact, every dev coding a user-facing application on a kitted-out laptop should be forced to test it with the shittiest machine + internet connection their users might have (or, say, 10% percentile or something). Sometimes it feels people live in a bubble where they assume the latest processors, 32GB of RAM, and stable 100Mbps+ internet is available everywhere.

> It matters nothing if your browser wastes 1GB of RAM even though it could just use 100MB because practically everyone already has 8GB to begin with

Yep, a bubble indeed x)

anikki
“Good enough” is defined by the inability of stakeholders to conceive of transitive benefits. Compute performance suffers the tragedy of the commons.
nuerow
> Good enough” is defined by the inability of stakeholders to conceive of transitive benefits.

The whole point is that there are absolutely no benefits, at least relevant ones, once the performance is acceptable. It's a diminishing returns game. There is always a tradeoff between performance and cost, and once performance is acceptable then it's hard to justify wasting more resources to get nothing of value in return.

stagger87
It sounds like your point of view is strictly from a blood thirsty shareholder. Thankfully not everyone developing software thinks like you.
stagger87
> The whole point is that there are absolutely no benefits, at least relevant ones, once the performance is acceptable.

Define acceptable. (Hint: you can't)

ncmncm
"Acceptable: Somebody accepts it."

Once you are talking about tradeoffs, it's game over. The question is, do you want to have spent your programming career writing crap code? Or do you want to make each thing you do be something you can be proud of, that is better than you would have done last week?

There is always something that could be done to make code faster, or shorter, more parallel, or line up columns better. What matters is whether you are pushing yourself to improve every day. If you improve yourself by discovering ways to make code fast, you will always find new ways to improve, and your improvements will also often make life better for other people.

If the quality of your code has no effect on anybody's life, it is time to find something else to code.

franknine
I think it's just a matter of different industries having different situations.In game industry performance is a real business advantage thus it’s a goal. On consoles, every game developer works on the same machine, the one who can get the most computation out of the box can cram more visual effects or more complex AI behaviour into the game and win the race. (I know not all developers are into this photorealistic madness, but some big studios are still fighting over it) Even if you are working on mobile games, better performance means less battery drain (longer player engagement), less overheating, and being able to deploy onto older or weaker devices. (The performance scaling characteristic of the new Doom is so good that it can run on Switch) Again, these are real business values.

On the other hand, premium games generally generate little to none revenue after release. The gameplay and hardware would likely be different for the next title. For instance the rendering architecture changed from forward, deferred, to cluster in response to new hardware capabilities. Which means the maintainability and reusability of the game codebase is less important compared to other software projects. Also there is a tendency that game programmers receive less compensation compare to other industries (Higher supply of junior devs), so the calculation of man * hours would be different as well.

Chris_Newton
Meanwhile, the main resource in software development is manhours. The faster you write software (add features, fix bugs, etc) the cheaper it is.*

People have argued that developer time is the most precious resource since forever. In more recent times, people have also argued that pushing new features needs to happen as quickly as possible, particularly in the context of web and mobile apps.

I am rather sceptical about both claims.

Yes, developer time is important. Developer compensation is probably the largest single cost in most software development organisations and you want a good return on that investment.

Yes, the goal is acceptable performance rather than perfect optimality every time. The best is often the enemy of the good here.

And yes, that means it is foolish to invest weeks of developer time to implement an O(n) algorithm when you had an O(n²) algorithm that ran fast enough in practice.

However, what if you have a data processing job running on some cloud infrastructure that is charged according to usage, and carelessly using an O(n²) algorithm when you could have spent an extra day to write an O(n log n) one increased your AWS bill for that system by a factor of 10?

Performance can be important in other areas that sometimes get overlooked, too. In communities like HN, most of us probably enjoy the use of modern devices with relatively high specifications, but not everyone is so lucky. A classic “works for me” problem is developers running on high-spec equipment who don’t experience frustrations that their users with more modest equipment will run into, when maybe that “acceptable performance” shouldn’t be considered so acceptable after all.

And what about other types of software, such as embedded systems, where there are often tighter resource constraints and being able to use less powerful hardware components can have a significant impact on the overall cost to produce a device?

Meanwhile, I see little evidence that the relentless push to push changes around every five minutes is an automatic win. Yes, deploying changes like security updates and critical bug fixes quickly is important, but are users really happier — or, from a business perspective, willing to pay more for our software — because of the modern culture of continuous deployment and frequent updates of everything? That’s less clear.

It is undeniable that a lot of customers will still pay for poor quality software, which means shipping poor quality software can be an attractive and lucrative business model, which means paying lots of money to developers who will only produce poor quality software can work, which reduces the incentives for developers to do better. This is unfortunately the world we live in. But it doesn’t mean some of us running software businesses or working in software development can’t try!

jorangreef
I agree with you in general, but O(n²) is always dangerous. Perhaps this example you give is a little to the extreme.

Writing slow software is also a form of waste. Then it becomes a question of ranking waste and getting rid of the most wasteful, according to a cost/benefit ratio, but waste is never a good thing to tolerate in any cultural sense.

Chris_Newton
O(n²) is only dangerous if you’re scaling past the point where the n² behaviour outweighs any constant factors and lower order terms. When n is small, a fancy algorithm with lower big-O complexity could still be outperformed by a brute force O(n²) one in practical situations.

On the other hand, a poor choice for a critical algorithm working with a large data set could easily increase your costs by orders of magnitude, so if anything I’d say the example I gave (assuming you meant the AWS costs one) was conservative.

I agree that we shouldn’t be careless about being wasteful, but big-O complexity rarely tells the whole story when it comes to performance.

jorangreef
Sure, I guess there are cases where O(n²) is okay and sometimes even faster, but in my experiences with those runtimes I've usually typically regretted it. It tends to come back and bite as your original comment made clear. I prefer O(n) to O(n²)!

Yes, definitely agreed about big-O complexity analysis vs mechanical sympathy.

In fact, I'm currently pair-programming on an Eytzinger layout-based binary search at work for TigerBeetleDB.

Eytzinger layouts are a case in point where big-O fails nicely, since you have exactly the same log2(n) number of comparisons as per binary search, but you're getting better memory locality by grouping the first few nodes of the binary tree together into a single cache line.

At first this looks like less cache misses, but then we actually exploit this even further by telling the memory subsystem to prefetch all 16 great-great grandchildren of a node (whether we need them later or not), so now we're also doing more cache misses (!) but to achieve lower memory latency, by trading off against memory bandwidth, and thus something that's way faster than binary search.

The paper on which this is based is "Array Layouts for Comparison-Based Searching".

yakubin
In my experience, people waste performance and manhours at the same time. Many abstractions create more code instead of reducing it, at the same time making the code slower, harder to read and maintain. Would you rather maintain a couple hundred lines of straightforward code or thousands of lines of class hierarchies with delegates and whatnot?

See John Carmack on Inlined Code: <http://number-none.com/blow/blog/programming/2014/09/26/carm...>

nuerow
> Would you rather maintain a couple hundred lines of straightforward code or thousands of lines of class hierarchies with delegates and whatnot?

I feel this is a gross misrepresentation of the problem.

With higher-level frameworks, you can add complex features with one-liners, which by their very nature (generic, extendable, and general purpose) are bloated and underperform when compared with the code you could roll yourself. However you need to write far more code than one-liners to reimplements those features, not to mention the time it you'd take your team to test, validate, and maintain it.

Therefore, contrary to your initial assumption, there is indeed a tradeoff between reusing someone else's general-purpose but battle-hardened code with your specialized, lean, but untested code, and the cost to go with rolling our own implementation hardly justifies the potential performance gains.

ratww
I also feel like your post is another gross misrepresentation of the problem.

The issue with slow software is rarely the framework itself. Even frameworks like Rails and Django are more than fast enough for most things. If you need absurd performance (for, I don't know, HFT?), then there are other frameworks in other languages. There is no need for bespoke code! Fast frameworks do exist. Also, when frameworks are slow in some parts, someone can just go there and optimise for everyone!

However the issues we normally encounter regarding speed are often caused by convoluted bespoke architectures.

It's always because Database access has to go trough ten, twenty classes, and not only it's slow, it's also hard to maintain, as you lost control over what the SQL looks like. It's always because serialisation requires some crazy Reflection that is several orders of magnitude slower and more complex than a simple "to json" call. It's always because the hot-loops of your sorting algorithms have to go trough some unnecessary only-used-once abstraction that makes the whole hot loop slow.

Java is fast as heck but got a reputation of being slow among users. Also Enterprise Java projects had a reputation of being difficult to navigate and therefore more expensive. The issue wasn't Java: it was the convoluted bespoke architectures that plague it.

It is widely acknowledged by its proponents that those difficult architectures take more time to build. However there is zero evidence that such things help with maintainability. In fact I'd argue that those arcane architectures make it worse for the general-case scenarios of: bug fixing (because more classes mean more bugs and more places for bugs to hide), optimisation (because measurement is harder in complex programs, and optimising often requires dismantling and rebuilding things), adding features (because it was hard to build the first features, it's gonna be hard for future brand new features too) and even refactoring (if the problem is the complex architecture itself, refactoring in parts will lead to a messier program).

So there you go: waste of man-hours and of processors.

So no, the parent poster's complain has nothing to do with the reuse of frameworks or libraries.

snovv_crash
The truth is that the framework is built on another framework, which is built on another, which is built on another, until we get down to individual transistors.

The highest level framework isn't always the one best suited to solving the problem you have. Maybe it's a one-liner, but due to the overhead you have to run a giant distributed system instead of a single machine with share memory. Then the overhead of orchestrating all these machines might be more than writing 10 lines in a lower level framework.

Oct 19, 2021 · 3 points, 0 comments · submitted by DeathArrow
Oct 17, 2021 · 3 points, 1 comments · submitted by blakehaswell
db48x
For not having the time to polish it until it was only an hour long, he did that lecture extremely well.
Oct 13, 2021 · 14 points, 2 comments · submitted by sidcool
adamrezich
the presentation setup here is fantastic. it took me a second to realize how it works: the video and the logo on his shirt are both horizontally flipped. idk if this is a common thing or not but it makes for a very effective and natural presentation.
ygra
I guess the flipped T-shirt is just for showing off (and nerd-sniping). It would work just as well if the T-shirt didn't have an image on it (negating the need for printing a custom T-shirt).
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.