HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
CppCon 2018: G. Nishanov “Nano-coroutines to the Rescue! (Using Coroutines TS, of Course)”

CppCon · Youtube · 18 HN points · 5 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention CppCon's video "CppCon 2018: G. Nishanov “Nano-coroutines to the Rescue! (Using Coroutines TS, of Course)”".
Youtube Summary
http://CppCon.org
“Memory Latency Troubles You? Nano-coroutines to the Rescue! (Using Coroutines TS, of Course)”

Presentation Slides, PDFs, Source Code and other presenter materials are available at: https://github.com/CppCon/CppCon2018

Are you doing memory lookups in a huge table?
Does your embarrassingly random access to your lookup tables lead to memory stalls?

Fear no more!

We will explore techniques that allow us to do useful work while the prefetcher is busily working on bringing the requested cache lines from main memory, by utilizing nano-coroutines.

And the best part, nano-coroutines can be easily implemented using Coroutines TS that is already available in MSVC and Clang compilers. With a little bit of library support we can utilize the coroutines to extract intra-thread parallelism and quadruple the speed up your lookups.

Gor Nishanov
Software Engineer, Microsoft
Gor Nishanov is a Principal Software Design Engineer on the Microsoft C++ team. He works on design and standardization of C++ Coroutines, and on asynchronous programming models. Prior to joining C++ team, Gor was working on distributed systems in Windows Clustering team.

Videos Filmed & Edited by Bash Films: http://www.BashFilms.com

*-----*
Register Now For CppCon 2022: https://cppcon.org/registration/
*-----*
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Sep 04, 2022 · 3 points, 0 comments · submitted by Tomte
Feb 10, 2022 · 3 points, 0 comments · submitted by Tomte
Jun 08, 2021 · 1 points, 0 comments · submitted by Tomte
It's a big deal because, while it has some downsides, being stalkless means they can have next to no overhead, meaning it can be performant to use coroutines to write asynchronous code for even very fast operations. The example given https://www.youtube.com/watch?v=j9tlJAqMV7U&t=13m30s is that you can launch multiple coroutines to issue prefetch instructions and process the fetched data, so you can have clean code that issues multiple prefetches and process the results. Whereas in Python (don't get me wrong, I love Python) you might use a generator to "asynchronize" slow operations like requesting and processing data from remote servers, C++ coroutines can be fast enough to asynchronously process "slow" operations like requesting data from main memory.
mazieres
Wow, that talk is a fantastic link. He actually gets negative overhead from using coroutines, because the compiler has more freedom to optimize when humans don't prematurely break the logic into multiple functions.
Mar 08, 2020 · 2 points, 0 comments · submitted by Tomte
Sep 09, 2019 · 7 points, 0 comments · submitted by Tomte
https://m.youtube.com/watch?v=j9tlJAqMV7U

This is an extreme version of yield on memory access

> In C++, technically every initialization of a sub-coroutine starts defaults to being a new heap allocation. I don't want to spread FUD: my understanding is that the vast majority of these are optimized out by the compiler. But one downside of this approach is that you could change your code and accidentally disable one of these optimizations.

I don't think this is correct. C++ 20 allows a lot of choices to implement it without forcing a heap allocation.

see https://lewissbaker.github.io/2017/11/17/understanding-opera... also see this video that goes in depth how to have billions of coroutines with C++: https://www.youtube.com/watch?v=j9tlJAqMV7U

Mar 26, 2019 · 1 points, 0 comments · submitted by Tomte
Gor Nishanov, one of the authors, talks about this in his CppCon2018 talk https://youtu.be/j9tlJAqMV7U (jump to ~12:33 to skip background).

It amounts to using CPU prefetch instructions with C++ coroutines to simulate hyperthreading in software by scheduling instructions around cache misses (but is potentially better than hyperthreads because it's not limited to 2/core)

quotemstr
> but is potentially better than hyperthreads because it's not limited

But also potentially worse because hyperthreads schedule only on actual memory waits, whereas this approach puts a suspension point after each prefetch whether or not the target is actually in cache.

None
None
int0x80
I haven't read the article or video. But regarding prefetching triggerd by software (programmer) instead of by the hw, this is a very informative read, with a lot of numbers and tests/profiles:

  https://lwn.net/Articles/444336/

  https://lwn.net/Articles/444346/
tuukkah
Your links show with benchmarks that software prefetching is not always useful, the example being a for loop to traverse a linked list in the Linux kernel.

However, the article at hand shows with benchmarks that software prefetching can be very beneficial in common algorithms such as hash probe, binary search, Masstree and Bw-tree, even when concurrency is implemented in a straight-forward way using (stackless) coroutines.

int0x80
The links are not only about a loop. The general conclusion is that making better informed decisions about whether to prefecth or not is very hard and that the CPU will have most of the time way more information to make a good informed decision. It also says that unless you proove by benchmarks that it makes sense, it is probably wrong.

Now. This is of course a general case. If you control the whole algo and data structures during the execution, a well crafted prefectch /can/ be beneficial. Again, the general idea of the links I posted is that /generally/ the CPU has more info of the /overall/ system state to make a correct prefetching choice. I think that info/links are usefull/interesting even if they do not apply to the specific case in TFA.

Clarifying a bit more: I didn't post that to contradict the article but just to provide a bit of related info.

Nov 05, 2018 · 1 points, 0 comments · submitted by Tomte
Similar results can be achieved with C++ nano co routines. https://www.youtube.com/watch?v=j9tlJAqMV7U&feature=youtu.be...
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.