HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Raymond Hettinger, Keynote on Concurrency, PyBay 2017

SF Python · Youtube · 22 HN points · 8 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention SF Python's video "Raymond Hettinger, Keynote on Concurrency, PyBay 2017".
Youtube Summary
Keynote for https://pybay.com, 2nd annual Regional Python Conference in SF.

Slides: http://pybay.com/site_media/slides/raymond2017-keynote/index.html
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
For anyone looking for something on the same topic but a little bit lighter / more practical, I really enjoyed this talk[1] by Raymond Hettinger at PyBay 2017, where he discussed the GIL as well.

[1] https://www.youtube.com/watch?v=9zinZmE3Ogk

To say concurrency is broken in Python because of the GIL seems to be reiterated adage without actual wisdom behind it.[1]

Are you writing an IO-bound task? Then asyncio[2] (even if just viewed as an interface: e.g., you can make it even more performant by using uvloop[3]) is a great way to achieve performant concurrency. As Raymond Hettinger likes to point out, even if you throw away the GIL, you will have to implement locks on shared resources (unless you're writing embarrassingly parallel codes--in which case either you can invoke them separately or use the multiprocessing module to great effect, or see how I do it easily below), and by the time you get all the locks implemented in your GIL-less world (locks which are error-prone), you've given up the performance you thought you'd get due to throwing away the GIL.

Are you writing a CPU-bound task? I've only encountered this in scientific computing, as I'm a scientist, and for this (as another commenter mentions) I either use numpy[4] (strictly within a numpy call, the GIL is not held) or numba[5] (by default the entirety of your numba-fied functions do not hold the GIL). Therefore, I can either use Python threads or asyncio or Numba's built-in parallelism[6,7] (summary: some numpy routines for working with arrays of values are auto-parallized if called within a numba-fied function, and you can explicitly parallelize your own loops via `for i in numba.prange(N):`; in any case, then decorate the function with `@numba.njit(parallel=True)`) to run these numerical routines in parallel. I easily get 100% of all my CPU's threads doing work this way.

I have parallelized numba-fied functions both via numba.prange and via python threads (yes, threads) in the past few weeks to speed up 20-60 minute-long-running codes, and even though I've done this before, I was still very pleasantly surprised at how easy it was, and I got very nearly a 16x speedup on my 16-thread laptop, which made my code practical for interactive data exploration and building up simulations to help me understand some things I was observing, which is where Python really shines for me.

Also note that writing a routine in numba provides a straightforward path to putting your numerical codes on GPU[8,9], too, if the that model makes sense for your algorithm (though I've primarily used PyCUDA for this, which worked well).

Are you writing something that needs to use distributed (e.g., cluster) based resources? Then dask (dask.distributed) is a great way to go, with a similar interface to the Python standard-library `futures`. Runs all the same code and you get the same forms of parallelism, but distributed across computers.

In any case, do you get to FORTRAN or C speeds? Probably not, but close enough, and with the benefits (and costs) of using Python code.

[1] talk by Raymond Hettinger: https://www.youtube.com/watch?v=9zinZmE3Ogk [2] asyncio: https://docs.python.org/3/library/asyncio.html [3] uvloop: https://github.com/MagicStack/uvloop [4] numpy: https://scipy-cookbook.readthedocs.io/items/ParallelProgramm... [5] numba: https://numba.pydata.org/ [6] parallelizing numba: overview blog post: https://www.anaconda.com/blog/parallel-python-with-numba-and... [7] parallelizing numba, docs: https://numba.pydata.org/numba-doc/dev/user/parallel.html [8] numba on gpu, example: https://github.com/numba/numba-examples/blob/master/examples... [9] numba on gpu, docs: http://numba.pydata.org/numba-doc/dev/cuda/index.html

tasubotadas
Is this some kind of Stockholm's syndrome?

This has been discussed plenty of times and it basically boils down to "just get f*cked and work around it". In my case, I just use different programming languages for performance sensitive tasks.

But it's amazing how people can come up with all sorts of justifications for a broken system that they happen to love. And then people wonder how there are people who can support Trump :D.

targafarian
To make this less personal and more specific for someone not enlightened on the topic, please point out how I am incorrect, specifically, or at least provide a source or point to such a discussion that shows how I am incorrect.

I am not a professional programmer, but I get a lot of work done very productively in Python (or at least I thought I was being productive; in light of how you speak to me, it sounds like I'm actually not being as productive as I thought I was, rather I'm in some world believing fake insinuations that what I'm doing is being productive...).

It is because only one thread at a time holds the lock in order to avoid race conditions. The keynote[1] by Raymond Hettinger from PyBay '17 will be a great place to start if you are new to this.

[1] https://youtu.be/9zinZmE3Ogk

Notebooks may be suitable for scratching things, but not much else. The problems noted by the OP are very serious, specially in the scientific world, where many do not have proper software engineering skills. The real issue is that newcomers do not know better, and so they don't realize the damage they're inflicting upon themselves and others before it's too late (irreproducible code, hidden states, dep. management, etc).

Even the fast.ai library, which is a wonder, has broken notebooks. For those who try to follow the course at home, trying to run the notebooks is frustrating, as things are out of order and so errors pop up all the time. Jeremy is a wonderful teacher, but compare following a Fast.ai course video, which uses notebooks, to following a python video from, e.g., Raymond Hettinger [2], which uses sphinx and a shell. While the documentation style and ugly shell don't look nearly as cool, they are so much clearer and better structured.

Notebooks become popular because they fill one gap that was left uncovered. As the scientific community moves away from Matlab into Python and R, reading code - pushed, amongst others, by the popularity of Github - becomes a day-to-day activity. Matlab scripts were easily explorable because users would load them, set breakpoints here and there, and look at results interactively - exactly what notebooks aim to provide. The difference is that what used to be breakpoints now become cells, comments now turn into Markdown and figures are inlined to add an extra layer of convenience. Yet all the awful problems of sloppy Matlab development are now masked, marketed as something fancy and start to pollute the Python dev. environment. Reproducibility and testing are gone, dependency management (which is not required for Matlab) breaks down completely, documentation is non-existent, and sharing becomes heavily constrained.

Notebooks may be suitable for scratching things, but not much else, at least nothing serious. Hopefully the slideshow above gets the attention it deserves. And in all cases, kudos for the Jupyter dev team for fighting the good fight, even with the drawbacks of their experiment.

[1] http://course.fast.ai/lessons/lesson3.html [2] https://www.youtube.com/watch?v=9zinZmE3Ogk

This talk (hopefully) conveys my point across

https://www.youtube.com/watch?v=9zinZmE3Ogk

Aug 15, 2018 · 2 points, 0 comments · submitted by dralley
I've started watching Raymond Hettinger's talks lately. He's a core dev on Python. I love them, the guy is charismatic, very smart and his Python knowledge constantly impresses me.

https://www.youtube.com/playlist?list=PLRVdut2KPAguz3xcd22i_...

My favourite: https://www.youtube.com/watch?v=9zinZmE3Ogk

eric24234
This is cool. I really love these type of posts where the people under discussion is not in the main stream but should have been popular in programming groups. John Carmack talks also is of that type. Very charismatic and delivers his speech standing at the same place with minimal movement for over an hour and talking from the fundamentals to the extreme details of the topics.
Threads are not hard. In fact, threads are extremely easy to implement.

However, real Threading code is just incredibly difficult to reason, just by looking at it. This makes it easy for you to introduce race conditions without even knowing that there is one!

There is also the fact that locks don't lock anything! They are just a flag, that a any code may choose to ignore.

They are a not an enforcing tool, just a cooperative one.

(More here: https://www.youtube.com/watch?v=9zinZmE3Ogk)

P.S. I created a library, that makes it easier to write safer multiprocessing code

https://github.com/pycampers/zproc

May 10, 2018 · 4 points, 0 comments · submitted by dralley
Raymond Hettinger, Keynote on Concurrency, PyBay 2017 https://www.youtube.com/watch?v=9zinZmE3Ogk
Dec 17, 2017 · 16 points, 0 comments · submitted by jstimpfle
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.