Hacker News Comments on "CppCon 2018: G. Nishanov “Nano-coroutines to the Rescue! (Using Coroutines TS, of Course)”" CppCon Youtube Video

Rankings: this week · month (apr/may) · year (2024) · all time

digests · search

Hacker News Comments on
CppCon 2018: G. Nishanov “Nano-coroutines to the Rescue! (Using Coroutines TS, of Course)”

CppCon · Youtube · 18 HN points · 5 HN comments

HN Theater has aggregated all Hacker News stories and comments that mention CppCon's video "CppCon 2018: G. Nishanov “Nano-coroutines to the Rescue! (Using Coroutines TS, of Course)”".

Youtube Summary

http://CppCon.org
“Memory Latency Troubles You? Nano-coroutines to the Rescue! (Using Coroutines TS, of Course)”
—
Presentation Slides, PDFs, Source Code and other presenter materials are available at: https://github.com/CppCon/CppCon2018
—
Are you doing memory lookups in a huge table?
Does your embarrassingly random access to your lookup tables lead to memory stalls?

Fear no more!

We will explore techniques that allow us to do useful work while the prefetcher is busily working on bringing the requested cache lines from main memory, by utilizing nano-coroutines.

And the best part, nano-coroutines can be easily implemented using Coroutines TS that is already available in MSVC and Clang compilers. With a little bit of library support we can utilize the coroutines to extract intra-thread parallelism and quadruple the speed up your lookups.
—
Gor Nishanov
Software Engineer, Microsoft
Gor Nishanov is a Principal Software Design Engineer on the Microsoft C++ team. He works on design and standardization of C++ Coroutines, and on asynchronous programming models. Prior to joining C++ team, Gor was working on distributed systems in Windows Clustering team.
—
Videos Filmed & Edited by Bash Films: http://www.BashFilms.com

*-----*
Register Now For CppCon 2022: https://cppcon.org/registration/
*-----*

HN Theater Rankings

This course is unranked · view top recommended courses

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.

CppCon 2018: G. Nishanov “Nano-coroutines to the Rescue (2018)

Sep 04, 2022 · 3 points, 0 comments · submitted by Tomte

CppCon 2018: G. Nishanov “Nano-coroutines to the Rescue

Feb 10, 2022 · 3 points, 0 comments · submitted by Tomte

CppCon 2018: G. Nishanov “Nano-coroutines to the Rescue

Jun 08, 2021 · 1 points, 0 comments · submitted by Tomte

⬐

Feb 22, 2021 · BenFrantzDale on My tutorial and take on C++20 coroutines

It's a big deal because, while it has some downsides, being stalkless means they can have next to no overhead, meaning it can be performant to use coroutines to write asynchronous code for even very fast operations. The example given https://www.youtube.com/watch?v=j9tlJAqMV7U&t=13m30s is that you can launch multiple coroutines to issue prefetch instructions and process the fetched data, so you can have clean code that issues multiple prefetches and process the results. Whereas in Python (don't get me wrong, I love Python) you might use a generator to "asynchronize" slow operations like requesting and processing data from remote servers, C++ coroutines can be fast enough to asynchronously process "slow" operations like requesting data from main memory.

⬐ mazieres
Wow, that talk is a fantastic link. He actually gets negative overhead from using coroutines, because the compiler has more freedom to optimize when humans don't prematurely break the logic into multiple functions.

CppCon 2018: G. Nishanov “Nano-coroutines to the Rescue

Mar 08, 2020 · 2 points, 0 comments · submitted by Tomte

CppCon 2018: G. Nishanov “Nano-Coroutines to the Rescue

Sep 09, 2019 · 7 points, 0 comments · submitted by Tomte

⬐

Aug 20, 2019 · ddorian43 on IBM Open-Sources Power Chip Instruction Set

https://m.youtube.com/watch?v=j9tlJAqMV7U
This is an extreme version of yield on memory access

⬐

Aug 20, 2019 · je42 on How Rust optimizes async/await

> In C++, technically every initialization of a sub-coroutine starts defaults to being a new heap allocation. I don't want to spread FUD: my understanding is that the vast majority of these are optimized out by the compiler. But one downside of this approach is that you could change your code and accidentally disable one of these optimizations.
I don't think this is correct. C++ 20 allows a lot of choices to implement it without forcing a heap allocation.
see https://lewissbaker.github.io/2017/11/17/understanding-opera... also see this video that goes in depth how to have billions of coroutines with C++: https://www.youtube.com/watch?v=j9tlJAqMV7U

CppCon 2018: G. Nishanov “Nano-Coroutines to the Rescue

Mar 26, 2019 · 1 points, 0 comments · submitted by Tomte

⬐

Nov 11, 2018 · akubera on Exploiting Coroutines to Attack the “Killer Nanoseconds” [pdf]

Gor Nishanov, one of the authors, talks about this in his CppCon2018 talk https://youtu.be/j9tlJAqMV7U (jump to ~12:33 to skip background).
It amounts to using CPU prefetch instructions with C++ coroutines to simulate hyperthreading in software by scheduling instructions around cache misses (but is potentially better than hyperthreads because it's not limited to 2/core)

⬐ quotemstr
> but is potentially better than hyperthreads because it's not limited
But also potentially worse because hyperthreads schedule only on actual memory waits, whereas this approach puts a suspension point after each prefetch whether or not the target is actually in cache.

⬐ None
None

⬐ int0x80
I haven't read the article or video. But regarding prefetching triggerd by software (programmer) instead of by the hw, this is a very informative read, with a lot of numbers and tests/profiles:
  https://lwn.net/Articles/444336/

  https://lwn.net/Articles/444346/
⬐ tuukkah
Your links show with benchmarks that software prefetching is not always useful, the example being a for loop to traverse a linked list in the Linux kernel.
However, the article at hand shows with benchmarks that software prefetching can be very beneficial in common algorithms such as hash probe, binary search, Masstree and Bw-tree, even when concurrency is implemented in a straight-forward way using (stackless) coroutines.

⬐ int0x80
The links are not only about a loop. The general conclusion is that making better informed decisions about whether to prefecth or not is very hard and that the CPU will have most of the time way more information to make a good informed decision. It also says that unless you proove by benchmarks that it makes sense, it is probably wrong.
Now. This is of course a general case. If you control the whole algo and data structures during the execution, a well crafted prefectch /can/ be beneficial. Again, the general idea of the links I posted is that /generally/ the CPU has more info of the /overall/ system state to make a correct prefetching choice. I think that info/links are usefull/interesting even if they do not apply to the specific case in TFA.
Clarifying a bit more: I didn't post that to contradict the article but just to provide a bit of related info.

CppCon 2018: G. Nishanov “Nano-Coroutines to the Rescue

Nov 05, 2018 · 1 points, 0 comments · submitted by Tomte

⬐

Oct 13, 2018 · petermcneeley on Cimple: Instruction and Memory Level Parallelism

Similar results can be achieved with C++ nano co routines. https://www.youtube.com/watch?v=j9tlJAqMV7U&feature=youtu.be...

Hacker News Comments on CppCon 2018: G. Nishanov “Nano-coroutines to the Rescue! (Using Coroutines TS, of Course)”

Hacker News Stories and Comments

Hacker News Comments on
CppCon 2018: G. Nishanov “Nano-coroutines to the Rescue! (Using Coroutines TS, of Course)”