HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
How Python was Shaped by leaky Internals, Armin Ronacher, Flask framework

Видео с мероприятий IT-People · Youtube · 7 HN points · 25 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Видео с мероприятий IT-People's video "How Python was Shaped by leaky Internals, Armin Ronacher, Flask framework".
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Definitely that is true, I wrote a comment about that here: https://news.ycombinator.com/item?id=18041047

Another issue with Python is that the language semantics are much more subtle. This is an excellent talk that shows how even operations that appear simple in Python can be much more complex than they appear: https://youtu.be/qCGofLIzX6g

I know very little about Python, but I would suggest these talks about a few ways in which Python is unique. They won't teach you Python, but they might help in making sense of the confusing bits you might encounter:

How Python was Shaped by leaky Internals, Armin Ronacher: https://youtu.be/qCGofLIzX6g

What Does It Take To Be An Expert At Python?: https://youtu.be/7lmCu8wz8ro

and maybe also

When Python Practices Go Wrong: https://youtu.be/ZdVgwhHXMpE

Jan 23, 2022 · fault1 on Faster CPython (2021) [pdf]
I thought this presentation by Armin Ronacher (the creator of Flask and Jinja) was very enlightening:

How Python was Shaped by leaky Internals(2016):

https://www.youtube.com/watch?v=qCGofLIzX6g

See also this HN post (Attempts to make Python fast):

https://news.ycombinator.com/item?id=24848318

Nov 10, 2021 · 1 points, 0 comments · submitted by jonathanbgn
Python is in a situation where it is enormously popular, due to a winning aesthetic in syntax/ergonomics that appeals to both newbies and experienced programmers. And due to this success, there is a lot of demand and interest in speeding the language up and getting rid of the GIL.

But paradoxically, this veneer of simplicity hides an incredible amount of complexity. You have to read hundreds of lines of CPython to understand the full semantics of a statement like "a + b". The semantics are complicated/compromised by many optimizations and implementation details of CPython. This is a fantastic talk that goes into these details: https://youtu.be/qCGofLIzX6g

It is not good for anybody that the semantics are this quirky. But it's especially not good for people trying to optimize the language, who basically have to implement quirk-for-quirk identical semantics to what CPython has ended up with after 30 years of evolution and optimization.

I wish Python 3.0 had included a more formalized and cleaned up set of semantics. These things can't be simplified without a breaking change to the language (even if only a small number of programs truly depend on these quirks). I would say Python should fix this in 4.0, but the 2->3 transition was so painful and long that I'm not sure the ecosystem can take it.

moonchrome
Forget about language quirks, package management and version management in python is a nightmare. The ammount of time I wasted dealing with python environment related issues alone makes me avoid it if I can.
marmaduke
Have you seen the Python docs which have pages on the execution model and data model, with notes in implementation details?

https://docs.python.org/3/reference/executionmodel.html

https://docs.python.org/3/reference/datamodel.html

ggrrhh_ta
I guess you are trying to imply that those documents cover a formal-enough description of what the execution and data model is. Well, even for the section "names" it just forgets to say if imported and non-imported names all share the same encoding and which should it be - just a nitpick, but with less than 5 seconds. Do you know that there are some behavioral limitations of dict that arise from a specific optimization in the implementation in C of dictionary iterators? If you create a new python following those documents, it is possible that you will allow a perfectly reasonable behavior that would fail in most other python interpreters.
joshuamorton
> Well, even for the section "names" it just forgets to say if imported and non-imported names all share the same encoding and which should it be - just a nitpick, but with less than 5 seconds

https://www.python.org/dev/peps/pep-0263/

Names aren't unique.

> Do you know that there are some behavioral limitations of dict that arise from a specific optimization in the implementation in C of dictionary iterators?

I'm rather curious what you're referring to here, do you mean dict-ordering, or something else?

ggrrhh_ta
Sorry, but "names aren't unique" does not even begin to define uniqueness - "from ... import X" imports and X; great; should it replace an "X" in a different encoding in the local namespace, or not?

I can tell you it has to do with iterators, and maybe someone will comment on what that is.

joshuamorton
By names aren't unique, I mean with respect to "encoding", everything is the same. It seems like you're using "encoding" to mean something it doesn't? Do you just mean like "value"?

> I can tell you it has to do with iterators, and maybe someone will comment on what that is.

This sounds like you don't know what it is.

ggrrhh_ta
Well, OK; I guess that, for you, everything is well described in those documents, or, you would argue, no worse that C or any other language has been spec'ed.
webmaven
> By names aren't unique, I mean with respect to "encoding", everything is the same. It seems like you're using "encoding" to mean something it doesn't? Do you just mean like "value"?

I think the GP means character encodings.

So the intent here is to ask whether (for example) a foo defined in a ISO 8859-1 encoded file is overidden by a foo imported from a Windows-1256 encoded file.

joshuamorton
> I think the GP means character encodings.

Then the link I posted answers their question. Names aren't unique in how encodings are handled. Files are decoded and canonicalized as a unicode string. If the identifiers are the same after canonicalization, then yes they are equivalent. If they aren't, then they aren't.

webmaven
Ah, I misunderstood what you meant by "aren't unique". I would have phrased that as "names aren't treated differently", or something similar.
ggrrhh_ta
I mean... I had not even yet heard the talk that others have pointed you to (from Armin Ronacher); after having gone several times down the rabbit hole of looking at what the interpreter is doing in specific scenarios (and I like lots of things, at the language level, that python does, the terseness and expresiveness that it brings) I also think that python is a very complex language under a soft-looking skin.
pjmlp
I follow Python since 1.6, and occasionally use it for portable scripting when on UNIX platforms (PowerShell on Win), and many aren't really aware that Python is Ada/C++ level of complexity, even if on the first contact seems like the new BASIC.

Besides the nuances of how everything is executed, there are occasionally breaking changes between minor language revisions.

You cannot just pick up a random Python script and be sure it still behaves the same way across all minor revisions.

haberman
In your second link it says:

> For instance, to evaluate the expression x + y, where x is an instance of a class that has an __add__() method, x.__add__(y) is called.

The talk I linked to (https://youtu.be/qCGofLIzX6g) is a deep dive on how there is much more to the story than the simplified statement above.

What I wish for is not better docs, but rather simpler language in which the statement above would actually be an accurate specification of the behavior implemented by the interpreter.

dec0dedab0de
My gut instinct is that many of Python's quirks are the reason for it's popularity. Though I might not be understanding your suggestion. Could you give a specific example of something you would clean up if you were able to? I have the video in a tab but won't be able to watch until later.
haberman
I think "a + b" is a good example.

I really recommend the talk I linked (https://youtu.be/qCGofLIzX6g), it gets deep into this stuff. I don't think anyone benefits from "a + b" having so many special cases.

chrisseaton
It wouldn't be too bad if it was complicated, but explicitly documented and tested.

For example people joke about JavaScript comparison semantics posting things like this https://i.stack.imgur.com/35MpY.png and laughing about it. But I wish Ruby and Python were this explicit and easy to understand!

marmaduke
Python docs have pages on the execution model and data model, with notes in implementation details. What more do you want?

https://docs.python.org/3/reference/executionmodel.html

https://docs.python.org/3/reference/datamodel.html

Reading these documents is not hard, and it’s been an enormous help over the years for writing effective efficient Python.

chrisseaton
> What more do you want?

Something formal, so I can actually reason about it and test it.

These documents are just informal prose. Are they sound? I don't know. Do you? Does anyone? Does my implementation match what they say? Who knows. Does CPython even match it? Does anyone know?

aw1621107
Out of curiosity, how many languages would meet those criteria? Only one I can think of off the top of my head is CompCert's C dialect. Are there others?
c-cube
Standard ML certainly has clear specifications and proofs of soundness of the type system.
chrisseaton
It's a spectrum - some languages do it a lot better than others. Not having anything more than a conversational English description of what it does is definitely the lower end of the spectrum. I'm sure very few are doing it perfectly, but for example Java has a formal semantics.
fcurts
> but for example Java has a formal semantics.

I don't think that's really true. The Java language specification is entirely prose. The book you linked to was written by "outsider" authors and published in 1999 (!).

chrisseaton
That was just one example of many - there's a whole cottage industry of writing formal semantics for Java.

https://fsl.cs.illinois.edu/publications/bogdanas-rosu-2015-...

fcurts
None of them are official, and I bet they make major simplifications (the one you just linked to is for Java 1.4). I doubt actual language/tooling implementors benefit much from them.
nevermore
As an outsider this may be your perception, but this is not how python's documentation works.

> These documents are just informal prose.

Not true. They're the language spec. Every guaranteed behavior of python is described clearly and concretely in these documents.

> Are they sound?

Yes.

> Does my implementation match what they say?

Yes.

> Does CPython even match it?

Yes.

> Does anyone know?

Yes! There's a very rigorous and thorough set of unit tests that specifically test an implementation's ability to match precisely the behavior described in these documents. All implementations (that I'm aware of, eg. cpython, pypy, jython, etc) state which versions of the spec they are compatible with, in other words they pass the unit test suite for that version.

Further, the maintainers of python (and by that I mean, regular contributors to the python-dev mailing list, not a cabal of robed individuals in a cave somewhere) are deeply aware of the language of the spec, the way the test suite implements it, and the importance of maintaining this relationship.

chrisseaton
>> Are they sound?

> Yes.

That's great! Can you point me at the formal proof? I haven't seen it myself.

> There's a very rigorous and thorough set of unit tests that specifically test an implementation's ability to match precisely the behavior described in these documents.

How can you test against English prose? You can't. So someone's manually translated the prose into tests elsewhere I guess. Have they done that correctly? How can we verify that? Was there any ambiguity when they were interpreting the English?

It's easy to see where these simple English descriptions aren't covering everything. To give you a practical example - look at https://docs.python.org/3/reference/datamodel.html#object.__... - 'should return False or True' - what if it doesn't? Where's that specified? Is it somewhere else in this document? That's the kind of practical issue we work with when implementing languages.

joshuamorton
> That's great! Can you point me at the formal proof? I haven't seen it myself.

Can you point me at the proof for the soundness of the documented behavior java or javascript or C++ docs? A cute little table isn't a substitute for soundness, and none of the languages you mentioned are mores soundly implemented (at least in their popular implementations).

> It's easy to see where these simple English descriptions aren't covering everything. To give you a practical example - look at https://docs.python.org/3/reference/datamodel.html#object.__... - 'should return False or True' - what if it doesn't? Where's that specified? Is it somewhere else in this document? That's the kind of practical issue we work with when implementing languages.

Cpython raises an exception, and in general, cpython is the spec unless otherwise specified is how things turn out.

> How can you test against English prose? You can't. So someone's manually translated the prose into tests elsewhere I guess. Have they done that correctly? How can we verify that? Was there any ambiguity when they were interpreting the English?

This is sort of a silly complaint. every spec is implemented in english prose[1]. That's why we end up with arguments about SHALL vs. MUST in the specs. Except in the rare cases where the spec is a test suite, which usually reduces to the case of cpython: the popular implementation is the spec (or maybe the popular implementation forks its test suite out into a different repo to make it more "independent")

[1]: Please don't make a irrelevant point about an obscure implementation of C that's implemented in agda and the "spec" is the proof of soundness or whatever, that's fundamentally the same as the implementation is the spec, especially given that said C implementation probably isn't ANSI compliant or whatnot.

chrisseaton
> Can you point me at the proof for the soundness of the documented behavior java or javascript or C++ docs?

https://link.springer.com/book/10.1007/3-540-48737-9

Java does have a formal semantics, with a whole chapter on its soundness.

> in general, cpython is the spec

Well there we go - turns out the written document doesn't cover everything after all. A second ago we were at 'Every guaranteed behavior of python is described clearly and concretely in these documents.' Turns out not.

I'm not criticising Python as being exceptionally bad, but we can certainly do it much better.

joshuamorton
> Java does have a formal semantics, with a whole chapter on its soundness.

No, there's a chapter on the soundness of its type system. The spec being sound and the type system being sound are very different things. If we consider typescript to be JS's type system, then JS's type system is unsound. If we consider cpython in isolation, under the definition you're using, cpython cannot be unsound, as it is untyped, QED.

If you're talking about whether the language's type system is sound, asking "These documents are just informal prose. Are they sound?" isn't even a well defined question.

> I'm not criticising Python as being exceptionally bad, but we can certainly do it much better.

You absolutely were when you said "But I wish Ruby and Python were this explicit and easy to understand!"

chucke
You're arguing with the guy who did truffleruby. I'd say he knows a thing or 2 about the soundness of dynamic languages and adhering to a formal spec.
xx_ns
Argumentum ab auctoritate.
joshuamorton
I'm aware of who I was talking to. It's worth noting that I think all three languages I mentioned (JS, python, Java) aren't sound according to their specs, insofar as they exist.
Python is difficult to optimize due to complex language semantics. This excellent talk describes how the semantics of the "add" operation are much more complex than generally described. a+b is not just a.__add__(b), you have to read 200+ lines of CPython interpreter code to fully resolve its semantics: https://youtu.be/qCGofLIzX6g

Not saying it's impossible, but it's more of an uphill battle than optimizing other languages that have more regular semantics.

Python is also constrained by its C API, which is a very low-level and exposes a lot of implementation details to C extensions. Changing the API could improve performance but it would also make Python incompatible with a large ecosystem of C extensions.

I sometimes wish for Python to define a new C API that exposes far less, and to begin a long-term migration towards the new API so that eventually the interpreter would be less constrained. It would be a large migration, but unlike Python 2 -> 3 it wouldn't affect everyday users, only extension authors. I don't know if it will ever happen, but a person can dream.

tekknolagi
Check out HPy.
Jun 20, 2021 · eigenspace on Is Julia Really Fast?
Python's classes can change dynamically though during runtime. So each iteration of a loop would need to check if the shape has changed. I'm no Python expert though, so please correct me if I'm mistaken.

But it's more than that. There's a really nice talk here from the creator of the Flask framework about a bunch of different semantic choices in the language design, leak out into the ecosystem and make many optimizations impossible: https://www.youtube.com/watch?v=qCGofLIzX6g

remexre
Taking a look at that, I think shapes would still work; [1] has some background material on them, but dynamically changing data is what shapes are for; you check if you should update an object's shape when assigning to it, and can JIT based on the type (you ought to be able to handle getattr and setattr with this too, JITting to a direct call to that method when it's necessary). Where this completely falls down in optimizing Python is that it changes the object layout, so you can't pass these to CPython without making them generic and converting them back after. Converting the layout to be generic would be a deep copy (which of course is semantics-breaking), converting it back wouldn't even be possible if the C code kept a reference to it around.

[1]: https://mathiasbynens.be/notes/shapes-ics

cogman10
Correct.

What Javascript JITs realized is that even though objects are malleable, they aren't often changed in most code. As a result, JITs make optimizations that assume that the object shape does not change (with checks to deoptimize when that assumption is violated).

But like you point out, this is all thrown out the window because the python C interface effectively directly exposes the memory and structure of python objects to C extensions. The nice thing about that is calling C is super fast and C has a lot of power. It can create, destroy, and modify python objects at will with little overhead.

This becomes problematic with garbage collection because you can't move memory around if C contains a direct reference to that memory (which is where most GC performance benefits come from). It effectively forces python to do reference counting.

Java also has C interfaces but they are MUCH more punishing to use and come with huge performance downsides. Even though they are working to decrease those costs, they won't the lower cost of python C costs. That's because giving a direct reference to an object managed by the JVMs GC is impossible. There is some level of memory copying that simply has to happen.

PyPy is what you get when you relax some of the compatibility constraints of python. It gets pretty significant performance boosts over the default python, but sacrifices the C interface.

Armin Ronacher, author Flask, actually has a good talk about this. The gist of it the way Python's internals leak into the language makes it very difficult to build a performant JIT that wouldn't break a large amount of userspace code.

Python lets you do _far_ more shenanigans that Javascript does; and a lot of large libraries depend on some of that behavior. Breaking it would probably cause a new 2 -> 3 situation.

https://www.youtube.com/watch?v=qCGofLIzX6g&feature=emb_titl...

Sohcahtoa82
> Python lets you do _far_ more shenanigans that Javascript does

Does JavaScript let you do this monstrosity of terrible code?

    Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> class Dog(object):
    ...     def speak(self):
    ...             print("bark!")
    ... 
    >>> class Cat(object):
    ...     def speak(self):
    ...             print("meow!")
    ...
    >>> animal = Dog()
    >>> animal.speak()
    bark!
    >>> animal.__class__ = Cat
    >>> animal.speak()
    meow!
    >>>
tracker1
Yes, you absolutely can. In fact, this was part of my ES5 Date constructor polyfill.
detaro
You can replace an object's prototype in JS, wouldn't that do the same?
Zarel
A lot of people have responded "yes", but no one's taught you how yet, so here's some example code that you can paste into your browser's Devtools or Node's REPL:

    class Dog {
      speak() {
        console.log("bark!");
      }
    }
    class Cat {
      speak() {
        console.log("meow!");
      }
    }
    animal = new Dog();
    animal.speak();
    Object.setPrototypeOf(animal, Cat.prototype);
    animal.speak();
This will produce:

    bark!
    meow!
ak217
> a lot of large libraries depend on some of that behavior.

Armin makes great points about path dependence of API design and how the CPython API leaks into the Python language spec. But the features being discussed are actually obscure (example: slots) or intended for debugging (example: frame introspection), and most libraries don't have a good reason to use them. We're stuck in a loop: people talk about how Python is special and can't use a JIT because its internals are not JIT-friendly, so we don't have a JIT, so implementers continue to make choices that are not JIT-friendly - not because they want to, but because they have no guidance.

The JIT doesn't have to be amazing on day 1. What it does have to do is show a commitment and a path to performant code, and illuminate situations where optimizations turn off. There's nothing fundamental in Python's design that prevents a JIT from working; a small number of rarely used dynamic features (that most people don't know about and don't know that they can negatively affect performance) should not be used to hold up interpreter design.

thu2111
GraalPython is on its way to solving this, by co-JIT-compiling both Python and the code of the native extensions simultaneously. However Python is a large language and ecosystem, so it'll take a while for the implementation to mature.
There's a great on this by the same author: https://www.youtube.com/watch?v=qCGofLIzX6g
You're arguing with mitsuhiko, he's given entire talks on this subject.

https://www.youtube.com/watch?v=qCGofLIzX6g&t=31m44s

PyPy is faster for pure Python code, but that comes at the expense of having a far slower interface with C code. There's an entire ecosystem built around the fact that while Python itself is slow, it can very easily interface with native code (Numpy, Scipy, OpenCV) with very little overhead.

So sure, you can make Python much faster, if you're willing to piss off the very Python users who care the most about performance in the first place (the data science / ML people and anyone else using native extensions).

As the OP states:

> mutable interpreter frames, global interpreter locks, shared global state, type slots

On top of this, Python is extremely dynamic and nothing can be assured without running through the code. So this leads to needing JITs to improve performance which then give a slow start up time and increased complexity. Even with JIT, Python is just not fast thanks to the above issues and it's overall dynamism.

It can be optimised and for sure there's some impressive attempts at doing so. However I don't think pure Python will ever be considered "fast" as these things necessarily get in the way.

I highly recommend the two videos posted here that go into more detail as to why there are limits to how far optimisation can go: https://youtu.be/qCGofLIzX6g https://youtu.be/IeSu_odkI5I

MR4D
C has a slow startup time as well. We just call that compiling.

Having the option to be slow startup/fast execution is a good option to have. Maybe not for some, but definitely needed by others.

jashmatthews
Chris already proved we can do exactly that for Ruby with TruffleRuby. I don’t think there’s any reason GraalPython couldn’t do the same given more work?
arc776
I'm not familiar with Ruby, but it doesn't seem like TruffleRuby is really competitive with languages known for performance.

These are only simple benchmarks, but do indicate a rough ballpark for TruffleRuby: https://github.com/kostya/benchmarks

As I understand it, Crystal would be a good Ruby alternative if you want performance. This is of course a whole new language designed with performance in mind from the beginning and here is a repeating theme: you need to consider performance at the start, not 20 years later.

jashmatthews
Without running again using GraalVM EE which TruffleRuby needs for things like Partial Escape Analysis these benchmarks aren’t very useful. I’ve made the same mistake myself before when benchmarking TR.

Crystal has wildly different semantics to Ruby, so it’s not a good alternative at all.

arc776
Ah, that's a good point. It would be interesting to see fairer benchmarks for TruffleRuby after reading about some of the optimisations they're able to make.
kzrdude
Could we seriously start looking at deprecating some of the features that make Python slow? Who needs mutable interpreter frames?
chrisseaton
Why not make mutable interpreter frames fast instead?
chrisseaton
> why there are limits to how far optimisation can go

I'd challenge the idea that there really are known 'limits'. As I say there's research towards this, these videos are old, and Armin and Seth may not be up to date with all of the literature (in fact I'm sure Seth is not, as he's missing at least one major current Python implementation research project from his blog post.)

arc776
> I'd challenge the idea that there really are known 'limits'.

There are good reasons why these limits cannot be overcome in that the complexity and dynamism of the language precludes it.

Being interpreted is one cost that sets a significant barrier to performance, and the dynamic complexity further compounds it. For example whereas JS is basically only functions, in Python you have a huge range of ways you can do incredibly complex things with slot wrappers, descriptors, and metaprogramming.

Ultimately, Python will get faster, but diminishing returns are inevitable. Python can never be as fast as the equivalent code in a compiled language. It simply has too much extra work to do.

chrisseaton
> There are good reasons why these limits cannot be overcome in that the complexity and dynamism of the language precludes it.

Can you give specific examples and prove that they cannot be overcome?

How much of the literature have you read?

I'll give you a concrete example of how I see these claims - people said monkey-patching in Python and Ruby was a hard overhead to peak temporal performance and fundamentally added a cost that could not be removed... turns out no that cost can be completely eliminated. I could give you a list of similar examples as long as you want.

arc776
> Can you give specific examples and prove that they cannot be overcome?

It's hard to prove a theoretical negative, but perhaps by comparison with the run time performance of static AOT (SAOT) compiled languages I can show what I mean.

Dynamic typing:

- Python requires dynamic type checking before any program work is done. SAOT doesn't need to do any run time work.

- Adding two numbers in Python requires handling run time overloads and a host of other complexities. SAOT is a single machine code instruction that requires no extra work.

Boxing values:

- Python value boxing requires jumping about in heap memory. SAOT language can not only remove this cost but reduce it to raw registers loaded from the stack and prefetched in chunks. This massively improves cache performance by orders of magnitude.

Determinism:

- In Python program operation can only be determined by running the code. In SAOT, since all the information is known at compile time programs can be further folded down, loops unrolled, and/or SIMD applied.

Run time overheads

- Python requires an interpretive run time. SAOT does not.

In summary: Python necessarily requires extra work at run time due to dynamic behaviour. SAOT languages can eliminate this extra work.

I do understand though that with JIT a lot of these costs can be reduced massively if not eliminated once the JIT has run through the code once. For example here they go through the painful process of optimising Python code to find what is actually slowing things down, to the point of rewriting in C: http://blog.kevmod.com/2020/05/python-performance-its-not-ju...

At the end they point out that PyPy gives a very impressive result that is actually faster than their C code. Of course, this benchmark is largely testing unicode string libraries rather than the language itself and I'd argue this is an outlier.

> How much of the literature have you read?

Literature on speeding up Python or high performance computing? The former, very little, the latter, quite a lot. My background is in performance computing and embedded software.

I'm definitely interested in the subject though if you've got some good reading material?

> people said monkey-patching in Python and Ruby was a hard overhead to peak temporal performance and fundamentally added a cost that could not be removed... turns out no that cost can be completely eliminated.

This really surprised me. Completely eliminated? I'm really curious how this is possible. Do you have any links explaining this?

chrisseaton
> I'm definitely interested in the subject though if you've got some good reading material?

My PhD's a good starting point on this subject https://chrisseaton.com/phd/, or I maintain https://rubybib.org/.

> This really surprised me. Completely eliminated? I'm really curious how this is possible. Do you have any links explaining this?

Through dynamic deoptimisation. Instead of checking if a method has been redefined... turn it on its head. Assume it has not (so machine code is literally exactly the same as if monkey patching was not possible), and get threads that want to monkey patch to stop other threads and tell them to start checking for redefined methods.

This is a great example because people said 'surely... surely... there will always be some overhead to check for monkey patching - no possible way to solve this can't be done' until people found the result already in the literature that solves it.

As long as you are not redefining methods in your fast path... it's literally exactly the same machine code as if monkey patching was not possible.

https://chrisseaton.com/truffleruby/deoptimizing/

arc776
Thanks for the links. Very interesting to read how you get around these tricky dynamic operations!

> get threads that want to monkey patch to stop other threads and tell them to start checking for redefined methods

As an aside, this sort of reminds me of branch prediction at a higher level. A very neat way to speed up for the general case of no patching.

> This is a great example because people said 'surely... surely... there will always be some overhead to check for monkey patching - no possible way to solve this can't be done' until people found the result already in the literature that solves it.

There is still overhead when patching is used though. If you don't use the feature, you don't pay the cost, however when monkey patching is used there is a very definite cost to rewriting the JIT code and thread synchronisation that compiled languages would simply not have.

I can see where you're coming from here. If we can reduce all dynamic costs that aren't used to nothing then we will have the same performance as, say, C. At least, in theory.

It would be certainly be interesting to see a dynamic language that can deoptimise all its functionality to a static version to match a statically compiled language. Still, any dynamic features would nevertheless incur an increased cost over static languages.

It's the dynamism itself of Python that incurs the performance ceiling over static compilation, plus the issues I mentioned in my previous reply about boxing and cache pressures. However you've definitely given me some food for thought over how close the gap could potentially be.

oblio
> ... turns out no that cost can be completely eliminated.

Is it eliminated in any production interpreter/VM used by Python, Ruby or any other mainstream language?

I mean, it's nice if it's research, but if I'm a boring programmer churning out Enterprise Middleware using these languages, do I get to use it?

Or is it just a pre-alpha branch of PyPy that might be out in 2025, if we're lucky? :-)

chrisseaton
Yeah it's not all production-ready yet.

But come on... we were arguing 'impossible' a second ago and now we're watered that down to 'not production ready'. We're making progress.

hitekker
My team spent a quarter trying to write a high-performance python web service. We investigated academic papers, evaluated open source solutions and then conducted bare metal tests to verify our findings.

Turns out the lookup is a dictionary in raw Python is two order magnitudes slower than an equivalent hashmap lookup in Java.

Once the numbers came in, we realized we had to choose the right language for the job.

Python is great but it's not the end-all, be-all.

chrisseaton
> Python is great but it's not the end-all, be-all.

Yes I agree often it's better to rewrite in a different language if you can.

But if people tell me they want to program in Python or Ruby and they tell me it's worth it for them... then let's make it as fast as we can for them.

Oct 21, 2020 · 1 points, 0 comments · submitted by xrisk
I found the following talks by Armin Romacher very informative on these topics (C API, why python is more difficult to speed up than JS).

https://youtu.be/qCGofLIzX6g https://youtu.be/IeSu_odkI5I

I wish these were the things Python 3 addressed, rather than Unicode. I guess it's much more obvious in hindsight than back when Python 3 was designed.

arc776
Just want to say thanks for these links, very interesting so far.

A point made in the video that seems to highlight the issue:

> Just adding two numbers requires 400 lines of code.

In compiled languages, this is one instruction! Think about the cache thrashing and memory loading involved in this one operation too. How can this possibly be fixed?

Python is a great language, but I don't know if it can ever be high performance on its own.

eesmith
Which compiled language adds 680564733841876926926749214863536422912 and 35370553733215749514562618584237555997034634776827523327290883 in one instruction?

FWIW, here's the relevant dispatch code in Python's ceval.c where you see it uses a very generic dispatching at that level, which eventually, deeper down, gets down to the "oh, it's an integer!"

        case TARGET(BINARY_ADD): {
            PyObject *right = POP();
            PyObject *left = TOP();
            PyObject *sum;
            /* NOTE(haypo): Please don't try to micro-optimize int+int on
               CPython using bytecode, it is simply worthless.
               See http://bugs.python.org/issue21955 and
               http://bugs.python.org/issue10044 for the discussion. In short,
               no patch shown any impact on a realistic benchmark, only a minor
               speedup on microbenchmarks. */
            if (PyUnicode_CheckExact(left) &&
                     PyUnicode_CheckExact(right)) {
                sum = unicode_concatenate(tstate, left, right, f, next_instr);
                /* unicode_concatenate consumed the ref to left */
            }
            else {
                sum = PyNumber_Add(left, right);
                Py_DECREF(left);
            }
            Py_DECREF(right);
            SET_TOP(sum);
            if (sum == NULL)
                goto error;
            DISPATCH();
        }
Python code can be made more high performance if there's some way to tell the implementation the types, either explicitly or by inference or tracing. That's how several of those listed projects get their performance.
the_mitsuhiko
Note that the version of BINARY_ADD you're looking at is newer than what the talk referenced. The "fast path" for integer addition was removed which the talk still talked about. You can see a discussion about that linked in the comment of the code you pasted.
arc776
Of course bigints require more than one instruction to add them, but even then you can reduce the work at compile time down to a series of integer operations, whereas the above code requires interpretting the program before it even gets to the add.

In your example text processing in `unicode_concatenate` is going to be very, very much slower than a bulk load of the native numerical data directly from memory and processing it. For each character, Python needs to check a number is still a number at run time then convert the result to a native numeric. I can only assume this string processing is at worst performed once and cached(?), because otherwise it doesn't seem like it would run well at all and surely Python's bigint performance is pretty important.

> Python code can be made more high performance if there's some way to tell the implementation the types, either explicitly or by inference or tracing.

At that stage, I would just use Nim and get better performance and a decent static type system included and either call it from Python, or call Python from Nim.

eesmith
You did write "two numbers" ;)

Guess I could also have used 5j + 3 as a counter-example.

If this is an issue then at this stage, many Python people switch to use one of the alternatives mentioned here, like Cython, which is a Python-like language which includes a static type system (including support for C++ templates) and can easily generate C extensions that can call and be called from Python.

baq
unicode absolutely had to be done. it'd be even more insane to leave strings as they were. maybe if you never venture outside of 7 bits it's only pain with negative ROI, but trust me the world has more languages than english and first-class support for unicode strings as just strings is a must. it was a painful transition but a necessary one. all other modern languages simply started there (and they're old enough to have a beer, too).
carabiner
(OP is Armin)
mumblemumble
I would guess that, if Python 3 hadn't addressed Unicode, Python would never have come to a place where so many people are worried about its performance.

Python's still a great language for the things it was being designed for back in the 2000s. But adding decent Unicode support is a big part of what helped it become an attractive language for use cases where I wish it performed better or had better support for parallelism. Natural language processing, for example.

kec
How do you square that assertion with the fact that people clung so hard to python 2 that it took the PSF 12 years to finally kill it?
mumblemumble
Some people clung hard to 2. Others flocked to 3.

In my direct experience, the only people who waited until the bitter end (and beyond) were ops folks who never had to stray much outside of 7-bit ASCII, and companies with large existing codebases that didn't want to allocate the resources to migrating. Neither of those really have much to do with my assertion that Python 3 attracted new people doing new things.

kgwgk
Any other example? Because there are lots of high-performance computing tasks where unicode couldn't matter less.
I highly recommend this talk by Armin Ronacher as an accompaniment (or prologue, or epilogue) to the article: "How Python was shaped by leaky internals", https://www.youtube.com/watch?v=qCGofLIzX6g
As others have mentioned, Python does have JIT compilers. The problem is that havign a JIT doesn't solve the problem.

PyPy is often a factor of 10 behind julia performance and projects like Numba, PyTorch (the PyTorch people had to build their own Python JIT compiler yikes!), etc. will always have a more restricted scope than a project like Julia because Python's very semantics make many optimizations impossible.

Here's a great talk from the author of the famous Flask library in Python: https://www.youtube.com/watch?v=qCGofLIzX6g where he discusses the fundamental problems with Python's semantics.

If you fix these problems, you will end up changing Python so fundamentaly that you'll really have a new language. Generic CPython 3._ code will certainly not be compatible with it.

dzonga
so in this case, once say the Julia ecosystem grows then migrate to Julia. or wait for optimizations to be done, e.g have pandas, numpy etc handle multi-core processors etc ?
simonbyrne
> or wait for optimizations to be done, e.g have pandas, numpy etc handle multi-core processors etc ?

All those exist already. Indeed, other than DataFrames.jl (the pandas equivalent) they are part of the language itself.

ChrisRackauckas
There's quite fundamental optimizations that will be missing if separately compiled pieces cannot optimize together. These barriers disallow many things. Additionally, anything that relies on higher order functions provided by the user, such as optimization or differential equations, will run into issues because those functions cannot be optimized by some super good package developer, but instead have to come from the user.

This blog post highlights the massive performance advantages Julia can have when you start integrating optimized packages and allow for optimizing across these barriers in Julia, and also the pros and cons of allowing these kinds of cross-function optimizations.

https://www.stochasticlifestyle.com/why-numba-and-cython-are...

No offense (I use Python all the time), but Python seems almost like the worst of all worlds, except for ease of use in the first month of using it -- with its only advantage being library inertia (which is a massive advantage in practice though).

As eg: here's an excellent talk on the cruft in Python internals by Armin Ronacher: https://www.youtube.com/watch?v=qCGofLIzX6g

For more discussion on why Python is slow: https://news.ycombinator.com/item?id=12025309

There is, after all, a graveyard littered with big-money attempts to speed up (mutually incompatible) subsets of Python.

But lack of inertia not something that should deter someone seriously invested (after all, Google/FB created Tensorflow/Pytorch respectively, and now Google might be getting behind Swift). It's a judgement call on whether one feels that most of the work/innovation still needs to be done, or is already done. If it is yet to be done, building on top of a better platform is almost a no-brainer.

jefft255
When I say Python is good for numerical computing and AI, what I really mean is that the libraries are there. I share the same reservation as you about the language itself. I wasn't so clear in my comment above
j88439h84
Armin's points are true, but PyPy makes the best of the situation, and hopefully the C-API will be deprecated at some point.
isatty
I disagree. I think that the use of Python in ROS is the best of both worlds. ROS makes it easy to do sane-ish IPC even for folks new to programming and enables them to offload heavy numerical processing or real time tasks out to C++ or even other boards on your robot.

Python sits by handling state machines (decision making) and similar logic. It’s also great in experimental code that folks want to change often (opencv prototyping).

I don’t like the lack of a type system and have run into easily preventable bugs with python but in my case the trade off with developer productivity was worth it.

Aug 12, 2019 · ssivark on Python Is Eating the World
Python's success is richly deserved. The fact that it has been a long time coming (Python is almost 30 years old!) is an indication of the amount of sheer effort involved, and should give heart for all those working on building up the next generation of languages.

Since this forum is likely to have some folks looking towards the future, I found interesting a couple of talks by Python veteran Armin Ronacher [1, 2]. Watching that presentation makes me wonder whether I should think of this in the context of "Worse is better". That said, Python's runtime introspection is such a killer feature that it's difficult to explain how empowering it feels for someone who is relatively new to programming.

[1]: How Python was Shaped by leaky Internals -- https://www.youtube.com/watch?v=qCGofLIzX6g

[2]: A Python for Future Generations -- https://www.youtube.com/watch?v=IeSu_odkI5I

Oct 21, 2018 · 1 points, 0 comments · submitted by jxub
There's a reason most high-performance Python libraries are not written that way, and core routines are just written in C instead. Proof of the pudding is in the eating!

See this talk by Armin Ronacher (creator of Flask) on why the design of python makes it fundamentally unfriendly to performance optimizations: https://youtu.be/qCGofLIzX6g?t=171

Julia has be designed ground up to avoid several such problems. See this discussion: https://discourse.julialang.org/t/julia-motivation-why-weren...

If your domain falls under the umbrella of numerical and scientific computing, writing Julia is as painless as writing python, with code that automatically runs roughly as fast as C. If you're used to writing numpy, you can hit the ground running in Julia, with maybe a few hours to become comfortable with the slightly different syntax and the names of useful libraries.

auntienomen
The point is that Cython provides a nice intermediate stage between C and CPython. Most optimizations need the first factor of 100, not the last factor of 2. You can usually achieve that in Cython with an effort measured in characters changed rather than lines of code changed.

I've played with Julia. It's nice enough, but it doesn't offer me anything I don't already get through the C/Cython/CPython hierarchy.

ChrisRackauckas
It offers a ton that you don't get from Cython: http://www.stochasticlifestyle.com/why-numba-and-cython-are-...
jzwinck
> There's a reason most high-performance Python libraries are not written [with Cython], and core routines are just written in C instead.

Pandas, Scipy and lxml are large, very popular Python libraries that use Cython. The article even mentions them at the end.

Jun 05, 2018 · 2 points, 0 comments · submitted by dralley
Perhaps the best way to understand what makes Julia fast is to watch these two videos about Python and R and what makes them so hard to optimize:

https://www.youtube.com/watch?v=qCGofLIzX6g

https://www.youtube.com/watch?v=HStF1RJOyxI

Take everything mentioned in these videos that make Python and R really hard to optimize and don't do those things :D

Things like the Carlo Verre hack (also a thing you can't change —any more— in Python: builtins), editing objects during their construction (via e.g. gc)... generally, the gc module allows other ways as well to crash your interpreter.

    >>> import gc
    >>> 'foo'.lower()
    >>> gc.get_referents(str.__dict__)[0]['lower'] = str.upper
    >>> 'foo'.lower()
    segmentation fault (core dumped)  python
(That's the method lookup cache)

A talk in this direction is https://www.youtube.com/watch?v=qCGofLIzX6g

Aug 06, 2017 · dom0 on Python Oddities
Some of these are pretty simple and not odd, e.g. immutability of tuples, iterator behaviour or modifying collections during iteration.

—————————————————————————————————————

Somewhat related video: https://www.youtube.com/watch?v=qCGofLIzX6g

Instead of typing a long list, I'll just refer you to this talk: https://www.youtube.com/watch?v=qCGofLIzX6g

Some of that has been fixed / got make up applied to play pretend.

orf
Thank you!
> In general, Python is slow (compared to C or whatever) because of excessive memory allocation and overuse of hash maps.

Python is a highly dynamic language with an API (towards both Python and C) that is very invasive. These two things, taken together, make optimizing the interpreter extremely difficult, because practically all of it can be modified or introspected. CPython being implemented largely as a hashtable-interpreter is only one facet to its performance.

Perhaps a talk recommendation: https://www.youtube.com/watch?v=qCGofLIzX6g&list=PLRdS-n5seL...

Jan 05, 2017 · 1 points, 0 comments · submitted by WoodenChair
Jan 05, 2017 · 1 points, 0 comments · submitted by tosh
Python needs a new runtime. This talk shows how bad of shape it's really in.

https://www.youtube.com/watch?v=qCGofLIzX6g&list=PLRdS-n5seL...

Basically, the language doesn't have a "spec" per-se. The language is whatever the defacto CPython implementation happens to do within it's giant eval loop.

Another great talk about CPython internals:

http://pgbovine.net/cpython-internals.htm

lobster_johnson
The reference [1] is a decent spec. It might not be as formally rigorous as an ISO standard, but it's probably as good as Go's [2], which also a "reference".

[1] https://docs.python.org/3/reference/index.html

[2] https://golang.org/ref/spec

jonathanyc
In my experience with both languages, while the Go spec is incredibly readable, navigable, and succinct, the Python reference is a sprawling mess that is difficult to navigate or even to Ctrl-F in.
vertex-four
To be fair, Go is also a much smaller language, which hasn't gone through the process of collecting and shedding multiple layers of legacy, and exposes far fewer implementation details.
dijit
cpython is the reference implementation, so it makes sense that it's;

A) Not well optimised.

B) Touting features before the spec/standard.

EDIT: people really dislike that I said this, and I'm having trouble finding my original citation- it was on one of the many python books I own. Most likely "Learn Python The Hard Way" but I'll dig out the exact chapter where they compare pypy to cpython and mention that because cpython is the reference implementation it values code clarity over performance optimisation.

the_duke
It makes sense that the official, primary and by far most popular implementation of one of the most used languages in existance is not well optimized?

(edit: I'm just being polemic about your statement here. CPython is reasonably optimized within the constraints it currently has).

nodja
It only makes sense because it's python. A language where style is part of the syntax, and readability is one of the things that many libraries focus on , the so called being "pythonic".

It makes sense that the reference implementation mirrors the same patterns than the language itself.

the_duke
Someone else already posted this link: https://www.youtube.com/watch?v=qCGofLIzX6g&list=PLRdS-n5seL...

It explains why CPython can't improve on many things.

ericfrederich
Please watch that first video, it's a good one. It explains how CPython essentially _is_ the spec because its internals leak into the spec when they have no business being there.
dom0
Pretty much the same is true about most other languages that have a single main implementation. This is even true, to some degree, for Java, which had competing implementations relatively early on.
viraptor
At some point it leaked pretty hard as well. The package scope was an unspecced implementation behaviour that became a standard later. (If I recall the story correctly)
paulsmith
Who says CPython is not well-optimized?

CPython is 25 years old -- people have been making it faster for a long time. Python 3.6, the latest release, has many performance improvements, cf. http://www.infoworld.com/article/3120952/application-develop...

lima
Notably, some of the 3.6 performance improvements were merged in from PyPy :-)
None
None
trotterdylan
Interestingly, achieving performance parity with CPython is one of the biggest challenges of this project. There are certain things CPython does very fast like allocating and freeing many small objects.
webmaven
So, Grumpy currently isn't faster than CPython?
devy
> Basically, the language doesn't have a "spec" per-se.

It does[1]. And process of improving it is called PEP[2].

[1]: https://docs.python.org/

[2]: https://en.wikipedia.org/wiki/Python_(programming_language)#...

lomnakkus
Uh, what? Claiming that your [1] is in any way a specification for the language is utterly absurd. It's far too vague. (Compare to even an IETF RFC, and you'll see what I mean. If you want to compare to a real language spec, compare to ISO C++.)
orf
CPython is the spec (or really more the CPython test suite). Just like the Ruby MRI. It's a simple, plain interpreter without many frills, and to add or remove a feature you have to submit a PEP which goes through a specification process.

Python started as a one-man-band project and of course didn't have a specification.

madgar
> Python started as a one-man-band project and of course didn't have a specification.

C started as a one-man-band project and of course does have a specification.

JavaScript started as a one-man-band project and of course does have a specification.

orf
C wasn't a one man band project (at least a two-man-band one at the start!) and neither was JavaScript. C also didn't have a formal specification for over 20 years (only an informal one) and JavaScript had a strong selection bias for interpreters that roughly conformed to the specification. But that didn't exactly help it, JS is/was notorious for differing implementations of browser APIs.

Each of them also has a strong need for a specification, as there are many differing compilers and interpreters. There are a few for Python but are specialized, the CPython interpreter is good enough for 90% of cases.

lomnakkus
Yes, but that's what the PP was arguing wasn't the case (by linking to a "spec"). I'm not sure why you're restating the obvious (OPs position).
devy
For your narrow minded understanding of what a programming language specification is only: https://en.wikipedia.org/wiki/Programming_language_specifica...

You can say it's imprecise or lack of ratification from one or many international organization(s), but you cannot say it doesn't exist. End of story.

guitarbill
> to a real language spec, compare to ISO C++.

No thanks. Written specs can always have interesting implications or undefined behaviour. Just because it's written in a more verbose language (English) doesn't mean it's less vague.

E.g. GGC is the de facto C spec for many. Code/platforms as spec makes more sense and is easier to maintain/update, with quicker iterations of language features (c.f. Ruby/Python to C++).

lomnakkus
My issue isn't necessarily with the fact that it's in English. It's that it's hopelessly imprecise English. Maybe you'd have less of an issue with the Java Language Spec? (Which, IIRC, even left out some memory model problems until recently.)
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.