Hacker News Comments on
Four Solutions to a Trivial Problem - Guy Steele Jr.
Google TechTalks
·
Youtube
·
57
HN points
·
8
HN comments
- This course is unranked · view top recommended courses
Hacker News Stories and Comments
All the comments and stories posted to Hacker News that reference this video.This type of interviewing style is bullshit. It means the interviewer knows a better solution that is "clever" and expects you to either have the same cleverness epiphany on the spot or to have studied this question. Neither is actually very useful as a hiring criterion.Guy Steele gave an incredibly interesting guest talk[1] at google about four different ways to solve this exact problem, and the fact that it's an interesting enough topic for an hour long google talk should probably be a clue that you shouldn't be expected to invent the best solution on the whiteboard in 40 minutes.
⬐ jll29I have interviewed hundreds of scientists-engineers-developers, and I never use esoteric quiz questions or logic riddles.I want to screen people for the skills that they will really need in their daily lives, so that's where I draw my pool of questions from. For example, processing data in the most simple ways (sorting, searching, extracting) quickly reveals if the candidates have actually _done_ what they claim they did. You'd be surprised how many Oxford PhDs struggle to write done a pipeline for extracting a simple word frequency list.
Now you might say "Maybe they're Windows people, or prefer Python." and my response is "I don't mind what tools you use - but you need to demonstrate that you can solve easy/common tasks on the spot, without wasting have a calendar day to re-invent the wheel."
⬐ black_puppydogThanks for the link, that made for entertaining watching!Curious that he gave that talk about parallelism in the end of 2015, and talked about how we'll engineer more systems to enable parallelism.
First question at the end was actually whether we can get this into existing languages because new languages are "notoriously hard to get accepted."
I'm currently learning Rust and now I'm wondering how the iterator map() and other "accumulation style" functions are implemented and whether there's a way to make these parallel, since the map() call treats things independently and a sum() could be done in the proposed tree style way.
Guess I have a piece of code to look up in the standard library :)
⬐ ehsanu1⬐ google234123Not sure if there's anything in the standard library, but I recall this as the definitive library for data parallelism in Rust: https://github.com/rayon-rs/rayonthis could be a good question. There’s nothing wrong with a candidate giving some less effluent answer and then progressing up to a better solution with some small hints or discussions⬐ weard_beard⬐ jodrellblankEffluent is poop⬐ vertis⬐ heleninboodlerIt should be a "more effluent" answer and then working up to less effluent :)⬐ weard_beardI think they were mathematically correct.Affluent + Eloquent = Bullshit
If I'm giving this sort of question, I expect you to solve it in an obvious way, write the code for it, then talk about potential improvements. Writing it twice while having an epiphany between the first and second is simply not reasonable. And selecting for candidates who have studied these problems well enough to already know the optimal solutions is not going to get you good developers, it's going to get you leetcode champs. If you're building a competitive leetcode team, then great. If not, you're just focusing on the wrong hiring bar.Guy Steele's talk came up here: https://news.ycombinator.com/item?id=30462835 with an APL solution of:with comment that it could "be fully parallelizable and run fast on the GPU, as it's based on a couple scan (generalized prefix sum) operations". Some explanation in my reply there.+/((⌽⌈\⌽a)⌊⌈\a)-a
I think APL has a lot to teach us about parallel (specifically GPU) programming. I've rewatched a few talks with Aaron Hsu and listened to his arraycast podcast[1].In researching an upcoming blog post on monoids, I tried implementing the problem in Guy Steele's "Four Solutions to a Trivial Problem" talk[2] in APL (actually I tried GNU APL for this, because I didn't need anything fancy). This is what I came up with:
That in turn should be fully parallelizable and run fast on the GPU, as it's based on a couple scan (generalized prefix sum) operations.+/((⌽⌈\⌽a)⌊⌈\a)-a
There's a considerable amount of DNA from APL in modern compiler-based machine learning frameworks, mostly by way of numpy. Dr. Hsu makes a good case that APL has quite a bit of power which is considerably diluted in later spins such as numpy. I'm especially fond of the power (⍣) operator[3], which represents parallel iteration and I think can often be translated to a per-thread while or for loop in a compute shader. I don't believe numpy has anything quite like that (but feel free to correct me!).
[1]: https://www.arraycast.com/episodes/episode19-aaron-hsu
[2]: https://www.youtube.com/watch?v=ftcIcn8AmSY
[3]: https://help.dyalog.com/15.0/Content/Language/Primitive%20Op...
⬐ adityaathalye> I'm especially fond of the power (⍣) operatorOh boy, I had a mind-melting moment with _inverse_ (⍣¯1).
Aaron helped me understand what was going on. He wrote about "Decoding Inverses" here: https://www.sacrideo.us/decoding-inverses/
And that took me from a clunky solution to Dismal Arithmetic (addition):
To this little gem:{10(⊤⍣¯1)⍵}∘{⌈/⍵}∘{10(⊥⍣¯1)⍵}⊢ 100000 10000 1000 100 10 1 111111
(Edit: add code example, fix language)da ← 10⊥(⌈/10⊥⍣¯1⊢) da 169 248 269
⬐ nyc111⬐ moonchildThanks for the reference to Dismal Arithmetic, I found it very interesting. By the way, according to Wikipedia, Dismal Arithmetic is now called Lunar Arithmetic. https://en.m.wikipedia.org/wiki/Lunar_arithmetic> power (⍣) operator[3], which represents parallel iteration and I think can often be translated to a per-thread while or for loop in a compute shader.Power is for sequential iteration. You may be thinking of rank (⍤).
⬐ raphlinus⬐ jodrellblankI spoke imprecisely. It is sequential iteration applied to each of the elements of an array, in parallel. That maps very well to compute shaders, but is hard to express in numpy because it's a higher-order operator, the right hand side is a function.⬐ moonchildThat is incorrect. The function is applied to its argument in its entirety. We can see this by using a function such as ⊂:(On GNU APL, it looks like the relevant formulation is ']boxing 3' rather than ']boxing on'; this simply affects the display of nested arrays.)]boxing on Was OFF (⊂⍣5) 2 3 4 ┌─────────────┐ │┌───────────┐│ ││┌─────────┐││ │││┌───────┐│││ ││││┌─────┐││││ │││││2 3 4│││││ ││││└─────┘││││ │││└───────┘│││ ││└─────────┘││ │└───────────┘│ └─────────────┘
⬐ raphlinusOk, thanks for the correction. I think I understood where I went wrong - Dr. Hsu's algorithm for traversing up a tree to the root uses this, using = as the right argument so it's a fixpoint, but it only happens that in this case the left operand to the power operator is piecewise. So sometimes this can (at least in theory) be efficiently compiled to compute shader, in other cases not.⬐ moonchildEven in the case where f has low rank, I do not think it is given that you would want to run multiple invocations of f in parallel in e.g. f⍣≡. Even though f has low rank, ≡ does not, and so gets applied to the entirety of the result. Meaning that you need a synchronization after every iteration; if f is cheap enough, that may not be worthwhile. (And low-rank g in f⍣g is meaningless, as it needs to produce a single boolean.)(You can do f⍣g⍤n, but then, again, that is rank doing the work of parallelization. And, performance considerations aside, I am no longer so convinced of the aesthetic utility of ⍤ as I once was. Wrt performance, when targeting CPUs, where branches are not so cheap, it might be worthwhile to design an f which can tolerate being called overly many times. Then give y an extraneous trailing axis the same size as a simd lane and do (f⍣g⍤1) y. That permits the whole computation to be vectorised, though it is also somewhat brittle.)
(Note ≡ is not the same as =. 1 1 1 ←→ 1 2 3 = 1 2 3, but 1 ←→ 1 2 3 ≡ 1 2 3. f⍣≡ is an idiom. f⍣= is meaningless except at rank 0, where is behaves the same as f⍣≡.)
⬐ raphlinusThanks for this explanation, it helps. I got into APL about 40 years ago (using an IBM 2741 terminal connected to a 360 if reconstruction of memory serves), but have only recently picked it up again, and not very deeply. From the kinds of things you're saying, the semantics line up somewhat with GPU-friendly parallelism, but not entirely.If my (newly refreshed) understanding is correct, you can implement f⍣≡ as a per-thread fixpoint if you know that f is elementwise independent, which it seems like you'd be able to determine at least some of the time by static analysis. But I can also see that would be brittle, and if you fell back to do a dispatch per iteration, performance would tank.
⬐ moonchild> semantics line up somewhat with GPU-friendly parallelism, but not entirelyAn APL machine was proposed[0]. I say it is the GPUs that are wrong! :)
https://www.slac.stanford.edu/pubs/slacreports/reports07/sla...
⬐ moonchild> If my (newly refreshed) understanding is correct, you can implement f⍣≡ as a per-thread fixpoint if you know that f is elementwise independent, which it seems like you'd be able to determine at least some of the time by static analysisHmm, yeah, that would work. Though you would also need to account for side effects in f.
> semantics line up somewhat with GPU-friendly parallelism, but not entirely
An APL machine was proposed[0]. I say it is the GPUs that are wrong! :)
https://www.slac.stanford.edu/pubs/slacreports/reports07/sla...
Following is an attempt to explain +/((⌽⌈\⌽a)⌊⌈\a)-aThe functions ⌊ ⌈ find min and max values, and \ is functional programming scan operator. Combining ⌈\ gives max-scan which does this:
moving to the right, the largest value (max) seen so far is kept and carried on.twentyRandoms 71 61 79 72 84 76 76 88 16 53 51 56 64 50 100 33 60 82 96 53 ⌈\twentyRandoms 71 71 79 79 84 84 84 88 88 88 88 88 88 88 100 100 100 100 100 100
Then ⌽ reverses the values in an array so this (⌽⌈\⌽a) does the same thing going from right to left, by reversing, max-scan, then reversing the result:
Then the ⌊ in the middle compares the two results an item at a time, taking the minimum of each pair:twentyRandoms 71 61 79 72 84 76 76 88 16 53 51 56 64 50 100 33 60 82 96 53 (⌽⌈\⌽twentyRandoms) 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 96 96 96 96 53
The (...a) - a part does item by item subtraction, each number in this result array minus the number in that position from the original array.(⌽⌈\⌽twentyRandoms)⌊⌈\twentyRandoms 71 71 79 79 84 84 84 88 88 88 88 88 88 88 100 96 96 96 96 53
Then +/ sums them up like a sum() or foldr or reduce:((⌽⌈\⌽twentyRandoms)⌊⌈\twentyRandoms)-twentyRandoms 0 10 0 7 0 8 8 0 72 35 37 32 24 38 0 63 36 14 0 0
Which, if I remember the talk and the code works, means that if you drew this array as a bar graph and poured water into the top middle of it, how much water would it hold before it all overflowed? The highest block on the left and the highest block on the right are the bounds of the highest points where the water can't get past. In between those, the bar heights are the depth at each position, and the water depth is from there up to the lower of the highest bar.+/((⌽⌈\⌽twentyRandoms)⌊⌈\twentyRandoms)-twentyRandoms 384
⬐ jazzyjacksonits interesting that ⌽ is used for reverse, it's phi, right? As the lesser is to the greater, the greater is to the whole? There's a vague notion of invertability to the golden ratio but I can't find the words for it. Maybe, a ratio that works in both directions.Am I way off base? Maybe there are other meanings to the character I'm unaware of.
⬐ moonchild⬐ raphlinusThe line is an axis of vertical symmetry in a 2-d context (circle); compare with ⊖, which shows an axis of horizontal symmetry. When applied to a matrix, ⌽ reverses rows and ⊖ reverses columns; the elements are mirrored around the pictured axis of symmetry. (In general, when applied to a high-rank array, ⊖ reverses the first axis of its argument, and ⌽ reverses the last.)⬐ moonchild(Axes of horizontal and vertical symmetry, respectively, of course; I don't know why I wrote that.)⬐ jazzyjacksono that's much less esoteric, thanks for the explanation⬐ beagle3There’s also a circle with a diagonal line through it (on phone, no Unicode keyboard) which consistently with the above axes of symmetry transposes a matrix; don’t remember if it transposes first two or last two axes on higher rank arrays.⬐ moonchildIt reverses the order of the axes. There is also a dyadic variant, which provides a new order for the axes. Duplicating an axis is no problem; but there is a danger of leaving out an axis. So in j, instead, the specified axes are moved to the rear.Yes! I believe being able to figure this out qualifies you as ⊢>+⌿÷≢⬐ jodrellblankAbove average, haha. Permanent beginner, I think :)
Hi, the author here.One of the hidden messages of the introduction is: watch Guy Steele's talks [1][2][3] if you are interested in data parallelism! These talks are not Julia-specific and the idea is applicable and very useful in any languages. My libraries are heavily inspired by these talks.
Of course, if you haven't used Julia yet, it'd be great if (data) parallelism gives you an excuse to try it out! It has a fantastic foundation for composable multi-threaded parallelism [4].
[1] How to Think about Parallel Programming: Not! https://www.infoq.com/presentations/Thinking-Parallel-Progra...
[2] Four Solutions to a Trivial Problem https://www.youtube.com/watch?v=ftcIcn8AmSY
[3] Organizing Functional Code for Parallel Execution; or, foldl and foldr Considered Slightly Harmful https://vimeo.com/6624203
[4] Announcing composable multi-threaded parallelism in Julia https://julialang.org/blog/2019/07/multithreading/
⬐ srousseyGiven your experience, what primitives would you advocate for JS for data parallelism?⬐ tkf⬐ nextosFor composable nested parallelism, I think spawn and sync like Julia has would be great. If there is a variant of spawn that asserts there is no concurrency is required for the child task, I'd imagine the runtime can do more optimizations (Julia doesn't have it but there was an experimental PR [1]). But concurrency sounds like a very basic thing in JavaScript so I'm not sure if it fits well in the language. But I don't have experience in serious JavaScript programming so I don't know what fits best there. I'd imagine you can build some data parallel libraries already with web workers API.I love Julia, and in many areas both the language and standard libraries are quite polished.However, simple parallelism based on threads (e.g. a parallel map) surprisingly requires a third party library. ThreadsX, the one you posted about, is also my favorite option.
Besides, the standard pipe operator (|>) doesn't support partial application. This has lead to several third-party reimplementations via macros, transducers, etc. Since this is a basic feature, fragmentation is a bit worrying.
⬐ eigenspaceJust to echo Tkf here, I think it'd be great to have something like ThreadsX.map in base someday, but when the multi-threading features in Julia first came out, it was unclear what the best interface to provide would be.Since Julia is committed to semantic versioning, this is a kinda scary prospect because it means that any high level, exported interface we decide on, we'll be stuck with until at least 2.0 and probably forever.
So with things like this, it's important to be conservative about the interfaces we expose and instead, we've have people explore different approaches out in the package ecosystem. One day, if someone can make a strong enough argument, I'm sure we'll see one of these package solutions end up in Base.
⬐ nextos⬐ tkfYes, I agree getting design choices right is important in order not to end up with a lot of legacy code in future versions of Julia.I think the pipe operator is probably the area that needs most urgent attention. Lots of libraries have their own slightly different macros for improved pipes with partial application, and that introduces a bit of fragmentation and hurts composability.
There's an open issue for this with some background: https://github.com/JuliaLang/julia/issues/5571
Yeah, I get the point that it's nice to have basic things in the core language and stdlib. I'm rather on the "minimal language core" side in this discussion and I think it'd be better to not start adding more things in Julia Base and stdlib. At least not right now, to avoid accumulating "dead batteries." There are some non-trivial things to figure out for a composable data-parallel interface. For example, what the data collection interface should be to support folds over them? I'm using a minimalistic interface defined in SplittablesBase.jl for this but it's not wildly used outside my packages. So I'm not super confident that it's enough yet.
Reminds me of Guy Steele tackling this trivial problem on stage: https://www.youtube.com/watch?v=ftcIcn8AmSY
associativity is the key to parallelismFour Solutions to a Trivial Problem - Guy Steele Jr. https://www.youtube.com/watch?v=ftcIcn8AmSY (using a monoid-cached tree)
I enjoyed this talk along the same lines by Guy Steele, regarding not necessarily forcing a sequential solution to a non-sequential problem:Four Solutions to a Trivial Problem - Guy Steele Jr.: https://www.youtube.com/watch?v=ftcIcn8AmSY
Look for steps that are commutative and/or associative in the problem to exploit parallelism.
⬐ hood_syntaxMonoids will save the world⬐ AboutTheWhislesThey've been around for 30 years.⬐ hood_syntaxDon't worry, it'll be any day now.⬐ bordercasesThey've been hard to communicate by amateurs
Guy Steele [1] sometimes mentions it. He gave an interesting Google Tech Talk called Four Solutions to a Trivial Problem [2], and at 1:59 [3] he said:Also, algebraic properties are important. I've got a background in applied algebra, and I think that has informed my programming. And I think that making programmers aware of algebraic properties of their code, and communicating some of those properties to the compiler, may be worthwhile.
The language Fortress [4], which he was one of the language designers of, allowed one to explicitly provide the compiler with such information - you could say that a certain operation was distributive or associative for instance, and the compiler could then do some refactorings and optimizations taking this knowledge into account.
[1] https://en.wikipedia.org/wiki/Guy_L._Steele_Jr.
[2] https://youtu.be/ftcIcn8AmSY
[3] https://youtu.be/ftcIcn8AmSY?t=1m59s
[4] https://en.wikipedia.org/wiki/Fortress_(programming_language...
⬐ FullyFunctionalGHC Haskell (not standard Haskell) supports this and it's heavily used. Look for "Rewrite rules".
⬐ stevebmarkI'm trying very hard to follow along but this quickly devolves into language that isn't easy to follow. When he starts going into combining "bitonic globs" is when I start having trouble focusing.For someone that's familiar with parallel programming topics and language, is this talk easy to follow? I'm trying to figure out if the fault lies with myself or the presenter. The vibe I'm getting is that showing slides full of code in an obscure language isn't a good teaching method, but I also don't have the prerequisite background here so my opinion isn't useful. I also don't do work that involves these classes of problems.
⬐ bjornsing⬐ jtolmar> For someone that's familiar with parallel programming topics and language, is this talk easy to follow? I'm trying to figure out if the fault lies with myself or the presenter.I think it was easy to follow if you focused on the aspects of general interest (= what he was doing in principle, e.g. divide-and-conquer). But sure, if you wanted to understand exactly how the merging of "bitonic globs" worked, then no, that's not easy to follow.
I remember thinking when he started talking about merging the "bitonic blobs" that "Yea, yea, sure, you can divide-and-conquer, but it's just about as much work combining two of those partial solutions as solving the original problem! There's got to a be a better way...", and I basically stopped listening to the details.
⬐ diskcatI could see the gist of the procedure i.e. representing it as globs which allow you to break up the data into 2 so you can run it in binary tree like steps.But the code was pretty useless. I didn't know that language and within the short time span that it was presented I couldn't understand the code.
Clearly this would be better presented as a paper so you could have time to go through it but I think if you ignore the actual code completely I still understood (or at least i think i do) the talk.
My key takeaway was that mathematical properties like associativity, commutativity, idempotency, and having an identity value are good tools for communicating with compilers. An associative binary operator plus an identity is enough to automatically turn your + into a Σ, with the compiler being able to choose a parallel or sequential implementation.Guy's example code uses a language that looks like mathematical notation, but I think that if you could tell the compiler what's associative and what the identity value is, you could get there with procedural code provided you use for-each instead of classic for. There's also probably a better way to phrase the first line than "var x = operator.identity;" so that more natural code ends up parallelizable, but I don't know what it is.
⬐ bjornsingI watched the whole thing and was a bit disappointed I must say. The TLDR (or W as in watch perhaps) is essentially if we write our programs in a map/reduce (functional) style fashion then the compiler/runtime can sometimes[1] choose between a parallel and a sequential implementation at runtime, depending on what computational resources are available. Seems fairly obvious to me... I would have much preferred the 10 minute version of this talk.1. Given that the underlying operations are order and grouping independent (associative and commutative) of course.
⬐ _u3ygThe part where the guy chuckles and says that some of us could recognize this problem as part of their employment application at Google was a bit disturbing. It sounded innocent, but, ominous really.Just assume for a second that I'm not a complete idiot. Assume that I've been around for awhile, that I know how to perform summations, that I can do running averages. Assume that I've seen complications that come with solving some of these basic problems. Assume that I've solved problems before only to be confronted by subsequent problems in the pipeline, and assume that I've come to expect that kind of development.
What in the world could my first impression mean to anyone?
⬐ deckar01I think calling it a "map/reduce (functional) style" is an understatement. The presenter is suggesting that when a problem must be solved in parallel, don't just code the low level parallel implementation. Analyze the problem, divide-and-conquer, then use high level abstractions that make parallelism a responsibility of the underlying implementation.This concept may be familiar to you, but talks like this will be necessary until parallelization is as seamless as garbage collection.
⬐ bjornsing> Analyze the problem, divide-and-conquer, then use high level abstractions that make parallelism a responsibility of the underlying implementation.That's a better summary than mine, but I still think it's too little substance for a 60 min talk. If you want a parallel(izable) implementation then divide-and-conquer is a well-known schoolbook approach. The problem as I see it is that it's not easy to build (or understand) algorithms like that. I saw very little in this talk about how to address/solve that, except the obvious: better education.
I think it was also a bit irritating that he pretended there was nothing a compiler could do about the accumulation loop he presented as the first example. Isn't it obvious that a smart compiler could (in principle) optimize that loop so that it could run in parallel on multiple cores...?