Hacker News Comments on "CppCon 2017: Chandler Carruth “Going Nowhere Faster”" CppCon Youtube Video

Rankings: this week · month (mar/apr) · year (2024) · all time

digests · search

Hacker News Comments on
CppCon 2017: Chandler Carruth “Going Nowhere Faster”

CppCon · Youtube · 3 HN points · 4 HN comments

HN Theater has aggregated all Hacker News stories and comments that mention CppCon's video "CppCon 2017: Chandler Carruth “Going Nowhere Faster”".

Youtube Summary

http://CppCon.org
—
Presentation Slides, PDFs, Source Code and other presenter materials are available at: https://github.com/CppCon/CppCon2017
—
You care about the performance of your C++ code. You have followed basic patterns to make your C++ code efficient. You profiled your application or server and used the appropriate algorithms to minimize how much work is done and the appropriate data structures to make it fast. You even have reliable benchmarks to cover the most critical and important parts of the system for performance. But you're profiling the benchmark and need to squeeze even more performance out of it... What next?

This talk dives into the performance and optimization concerns of the important, performance critical loops in your program. How do modern CPUs execute these loops, and what influences their performance? What can you do to make them faster? How can you leverage the C++ compiler to do this while keeping the code maintainable and clean? What optimization techniques do modern compilers make available to you? We'll cover all of this and more, with piles of code, examples, and even live demo.

While the talk will focus somewhat on x86 processors and the LLVM compiler, but everything will be broadly applicable and basic mappings for other processors and toolchains will be discussed throughout. However, be prepared for a lot of C++ code and assembly.
—
Chandler Carruth: Google, Software Engineer

Chandler Carruth leads the Clang team at Google, building better diagnostics, tools, and more. Previously, he worked on several pieces of Google’s distributed build system. He makes guest appearances helping to maintain a few core C++ libraries across Google’s codebase, and is active in the LLVM and Clang open source communities. He received his M.S. and B.S. in Computer Science from Wake Forest University, but disavows all knowledge of the contents of his Master’s thesis. He is regularly found drinking Cherry Coke Zero in the daytime and pontificating over a single malt scotch in the evening.
—
Videos Filmed & Edited by Bash Films: http://www.BashFilms.com

HN Theater Rankings

This course is unranked · view top recommended courses

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.

⬐

Mar 01, 2021 · tiddles on Making SoA Tollerable

I happened to watch a talk by Chandler yesterday that might be the one the author was referring to?
https://youtube.com/watch?v=2EWejmkKlxs

⬐

Nov 15, 2018 · pkaye on Researchers discover seven new Meltdown and Spectre attacks

Chandler Carruth has a good talk with an example of why speculative execution is critical for performance. https://www.youtube.com/watch?v=2EWejmkKlxs It starts around 36m13s time frame.

⬐ dnautics
conceivably this could be all put into the compiler.

⬐ pkaye
You mean speculative execution? Do you know of an sample implementation?

⬐ twtw
Conceivable, yes. Practical? Not so much.
This is an idea that I personally love, but that hasn't fared well so far. Compilers are not as good as assigning instruction schedules statically as hardware can do dynamically.

⬐ dnautics
curious as to why hardware can do it dynamically while software can't. It's all logic in the end.
I can understand "not being able to statically compile it because every architecture is different" but, presuming our compiler compiled to a specific platform - why wouldn't it be able to dynamically rearrange in, say, a JITted fashion using exactly whatever logic is available in the hardware.

⬐ twtw
Hopefully I'll put together a more technical answer in a while, but for now I'll just point out that when talking about performance, reducing things to "it's all logic in the end" makes little sense. We could emulate a modern CPU on an 8-bit micro controller, but the performance would be bad.

⬐ ken
That's a fascinating video about how modern processors work, but I don't see here why it's critical for performance. If you built a CPU without speculation, how bad would perf be? What other features could you still use? How much do common algorithms depend on speculation?

⬐ pkaye
Superscalar processors have a deep pipeline with many execution units and keep a lot of instructions in flight so a penalty of a misprediction or stall is significant. Every time it reaches a branch instruction that depends on a result which is not yet available it would need to speculate or stall. Most programs consist of small amounts of compute code followed by a branch that might depends on the results of that code.

⬐

Jun 25, 2018 · jasode on DevTube: Searchable index of developer videos

>The YouTube inteface says "auto-generated"
Ok, I never noticed that. I just read the captions and assumed obvious misspellings were auto-generated.
For example in this video[1], the caption text is "L1D cache misses" but he's actually saying "L1-dcache misses". (The Linux terminal screen he's showing does display "L1-dcache".) Even though that video is not labeled as "auto-generated", I assumed it was because of the bad caption. Based on your info, I guess CppCon uses humans like Mechanical Turk or other non-domain typists to manually add the captions.
[1] https://youtu.be/2EWejmkKlxs?t=21m16s

⬐ yorwba
Manual captioning is almost always not done by domain experts, but by people who have some training with a captioning system and work as professional captionists. Their main advantage is that they'll caption much faster and much cheaper than having domain experts do it, but the quality tends to suffer.
In college, I met a deaf guy who always had two women accompany him to lectures; one of them would repeat everything into a mouth-covering microphone to generate an automatic transcription and the other went over it to correct obvious errors. They generated a lot of nonsense, especially when the German professor was using some English loanwords for CS concepts. I was always amazed that the deaf guy still somehow managed to learn something from these garbled transcriptions.

⬐ Bromskloss
Why not sign language?

⬐ yorwba
I guess sign language interpreters are more expensive.

⬐

Dec 15, 2017 · inetknght on Intel i7 loop performance anomaly (2013)

Disclaimer: I am not an expert and have not measured. This is armchair theory. But, I would argue two things.
First, the former appears to have at least one unaligned arithmetic:
> 400538: mov 0x200b01(%rip),%rdx # 601040 <counter>
...while the latter's equivalent instruction is 4-byte aligned:
> 40057d: mov 0x200abc(%rip),%rdx # 601040 <counter>
So, I would argue that's the biggest source of _speedup_ in the second case. However, I'm really interested in whether that's true since I don't see a memory fence; so the memory should be in L0 cache for both cases; I have trouble believing that an unaligned access can be so much slower with the data in cache.
As for the `callq` to `repz retq`, I would venture a guess that the CPU's able to identify that there are no data dependencies there and the data's never even stored; I'd argue that it probably never even gets executed because the instruction should fit in instruction cache and branch prediction cache and all. Arguably. Like I said, I'm not an expert.
I'd say run it through Intel's code analyzer tool.
https://software.intel.com/en-us/articles/intel-architecture...
Tangential video worth watching:
https://www.youtube.com/watch?v=2EWejmkKlxs&feature=youtu.be...
Edit: actually, thinking about it, it's not unaligned access, it's unaligned math. I don't think that should affect performance at all? Fun.

⬐ nkurz
I'm sorry, but like the other comment at the bottom, your guesses are so far from reality that they are hard to respond to. IACA is great for what it does, but it's a static analyzer and knows nothing about alignment. L0 doesn't even exist on modern Intel processors. Memory fences would change things, but aren't part of the problem as stated. And your guess that "it probably never even gets executed because the instruction should fit in instruction cache and branch prediction cache and all" just doesn't have any bearing on they way processors work.
Your disclaimer does indicate that you have the self-awareness that you are not an expert, but the fact that you are trying to make an argument would normally indicate that you think you understand what's happening to some extent. Rather than just guessing, I think you'd benefit from trying some things out and seeing what the results are. Play with perf, it's fun!

CppCon 2017: Chandler Carruth “Going Nowhere Faster”

⬐

Nov 02, 2017 · 3 points, 1 comments · submitted by matt_d

⬐ matt_d
Slides: http://chandlerc.github.io/talks/cppcon2017/going_nowhere_fa...

Hacker News Comments on CppCon 2017: Chandler Carruth “Going Nowhere Faster”

Hacker News Stories and Comments

Hacker News Comments on
CppCon 2017: Chandler Carruth “Going Nowhere Faster”