HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Four Solutions to a Trivial Problem - Guy Steele Jr.

Google TechTalks · Youtube · 57 HN points · 8 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Google TechTalks's video "Four Solutions to a Trivial Problem - Guy Steele Jr.".
Youtube Summary
Google Tech Talk, 12/1/2015, Presented by Guy L. Steele Jr.
ABSTRACT: We present a small but interesting geometrical problem and then examine four different computational approaches to solving it: a "classic sequential solution" and three different approaches that are amenable to parallel implementation, comparing them to highlight various advantages and disadvantages, including total work required and minimum time to solution. All four solutions are illustrated both pictorially and with working code. We argue that certain approaches work better than others if exploitation of parallelism is to be automated. There will also be at least one joke.

About the Speaker

https://en.wikipedia.org/wiki/Guy_L._Steele,_Jr.
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
This type of interviewing style is bullshit. It means the interviewer knows a better solution that is "clever" and expects you to either have the same cleverness epiphany on the spot or to have studied this question. Neither is actually very useful as a hiring criterion.

Guy Steele gave an incredibly interesting guest talk[1] at google about four different ways to solve this exact problem, and the fact that it's an interesting enough topic for an hour long google talk should probably be a clue that you shouldn't be expected to invent the best solution on the whiteboard in 40 minutes.

[1] https://www.youtube.com/watch?v=ftcIcn8AmSY

jll29
I have interviewed hundreds of scientists-engineers-developers, and I never use esoteric quiz questions or logic riddles.

I want to screen people for the skills that they will really need in their daily lives, so that's where I draw my pool of questions from. For example, processing data in the most simple ways (sorting, searching, extracting) quickly reveals if the candidates have actually _done_ what they claim they did. You'd be surprised how many Oxford PhDs struggle to write done a pipeline for extracting a simple word frequency list.

Now you might say "Maybe they're Windows people, or prefer Python." and my response is "I don't mind what tools you use - but you need to demonstrate that you can solve easy/common tasks on the spot, without wasting have a calendar day to re-invent the wheel."

black_puppydog
Thanks for the link, that made for entertaining watching!

Curious that he gave that talk about parallelism in the end of 2015, and talked about how we'll engineer more systems to enable parallelism.

First question at the end was actually whether we can get this into existing languages because new languages are "notoriously hard to get accepted."

I'm currently learning Rust and now I'm wondering how the iterator map() and other "accumulation style" functions are implemented and whether there's a way to make these parallel, since the map() call treats things independently and a sum() could be done in the proposed tree style way.

Guess I have a piece of code to look up in the standard library :)

ehsanu1
Not sure if there's anything in the standard library, but I recall this as the definitive library for data parallelism in Rust: https://github.com/rayon-rs/rayon
google234123
this could be a good question. There’s nothing wrong with a candidate giving some less effluent answer and then progressing up to a better solution with some small hints or discussions
weard_beard
Effluent is poop
vertis
It should be a "more effluent" answer and then working up to less effluent :)
weard_beard
I think they were mathematically correct.

Affluent + Eloquent = Bullshit

heleninboodler
If I'm giving this sort of question, I expect you to solve it in an obvious way, write the code for it, then talk about potential improvements. Writing it twice while having an epiphany between the first and second is simply not reasonable. And selecting for candidates who have studied these problems well enough to already know the optimal solutions is not going to get you good developers, it's going to get you leetcode champs. If you're building a competitive leetcode team, then great. If not, you're just focusing on the wrong hiring bar.
jodrellblank
Guy Steele's talk came up here: https://news.ycombinator.com/item?id=30462835 with an APL solution of:

     +/((⌽⌈\⌽a)⌊⌈\a)-a
with comment that it could "be fully parallelizable and run fast on the GPU, as it's based on a couple scan (generalized prefix sum) operations". Some explanation in my reply there.
Jul 12, 2022 · 1 points, 0 comments · submitted by tosh
I think APL has a lot to teach us about parallel (specifically GPU) programming. I've rewatched a few talks with Aaron Hsu and listened to his arraycast podcast[1].

In researching an upcoming blog post on monoids, I tried implementing the problem in Guy Steele's "Four Solutions to a Trivial Problem" talk[2] in APL (actually I tried GNU APL for this, because I didn't need anything fancy). This is what I came up with:

    +/((⌽⌈\⌽a)⌊⌈\a)-a
That in turn should be fully parallelizable and run fast on the GPU, as it's based on a couple scan (generalized prefix sum) operations.

There's a considerable amount of DNA from APL in modern compiler-based machine learning frameworks, mostly by way of numpy. Dr. Hsu makes a good case that APL has quite a bit of power which is considerably diluted in later spins such as numpy. I'm especially fond of the power (⍣) operator[3], which represents parallel iteration and I think can often be translated to a per-thread while or for loop in a compute shader. I don't believe numpy has anything quite like that (but feel free to correct me!).

[1]: https://www.arraycast.com/episodes/episode19-aaron-hsu

[2]: https://www.youtube.com/watch?v=ftcIcn8AmSY

[3]: https://help.dyalog.com/15.0/Content/Language/Primitive%20Op...

adityaathalye
> I'm especially fond of the power (⍣) operator

Oh boy, I had a mind-melting moment with _inverse_ (⍣¯1).

Aaron helped me understand what was going on. He wrote about "Decoding Inverses" here: https://www.sacrideo.us/decoding-inverses/

And that took me from a clunky solution to Dismal Arithmetic (addition):

       {10(⊤⍣¯1)⍵}∘{⌈/⍵}∘{10(⊥⍣¯1)⍵}⊢ 100000 10000 1000 100 10 1
  111111
To this little gem:

       da ← 10⊥(⌈/10⊥⍣¯1⊢)
       da 169 248
  269
(Edit: add code example, fix language)
nyc111
Thanks for the reference to Dismal Arithmetic, I found it very interesting. By the way, according to Wikipedia, Dismal Arithmetic is now called Lunar Arithmetic. https://en.m.wikipedia.org/wiki/Lunar_arithmetic
moonchild
> power (⍣) operator[3], which represents parallel iteration and I think can often be translated to a per-thread while or for loop in a compute shader.

Power is for sequential iteration. You may be thinking of rank (⍤).

raphlinus
I spoke imprecisely. It is sequential iteration applied to each of the elements of an array, in parallel. That maps very well to compute shaders, but is hard to express in numpy because it's a higher-order operator, the right hand side is a function.
moonchild
That is incorrect. The function is applied to its argument in its entirety. We can see this by using a function such as ⊂:

         ]boxing on
   Was OFF
         (⊂⍣5) 2 3 4
   ┌─────────────┐
   │┌───────────┐│
   ││┌─────────┐││
   │││┌───────┐│││
   ││││┌─────┐││││
   │││││2 3 4│││││
   ││││└─────┘││││
   │││└───────┘│││
   ││└─────────┘││
   │└───────────┘│
   └─────────────┘
(On GNU APL, it looks like the relevant formulation is ']boxing 3' rather than ']boxing on'; this simply affects the display of nested arrays.)
raphlinus
Ok, thanks for the correction. I think I understood where I went wrong - Dr. Hsu's algorithm for traversing up a tree to the root uses this, using = as the right argument so it's a fixpoint, but it only happens that in this case the left operand to the power operator is piecewise. So sometimes this can (at least in theory) be efficiently compiled to compute shader, in other cases not.
moonchild
Even in the case where f has low rank, I do not think it is given that you would want to run multiple invocations of f in parallel in e.g. f⍣≡. Even though f has low rank, does not, and so gets applied to the entirety of the result. Meaning that you need a synchronization after every iteration; if f is cheap enough, that may not be worthwhile. (And low-rank g in f⍣g is meaningless, as it needs to produce a single boolean.)

(You can do f⍣g⍤n, but then, again, that is rank doing the work of parallelization. And, performance considerations aside, I am no longer so convinced of the aesthetic utility of ⍤ as I once was. Wrt performance, when targeting CPUs, where branches are not so cheap, it might be worthwhile to design an f which can tolerate being called overly many times. Then give y an extraneous trailing axis the same size as a simd lane and do (f⍣g⍤1) y. That permits the whole computation to be vectorised, though it is also somewhat brittle.)

(Note ≡ is not the same as =. 1 1 1 ←→ 1 2 3 = 1 2 3, but 1 ←→ 1 2 3 ≡ 1 2 3. f⍣≡ is an idiom. f⍣= is meaningless except at rank 0, where is behaves the same as f⍣≡.)

raphlinus
Thanks for this explanation, it helps. I got into APL about 40 years ago (using an IBM 2741 terminal connected to a 360 if reconstruction of memory serves), but have only recently picked it up again, and not very deeply. From the kinds of things you're saying, the semantics line up somewhat with GPU-friendly parallelism, but not entirely.

If my (newly refreshed) understanding is correct, you can implement f⍣≡ as a per-thread fixpoint if you know that f is elementwise independent, which it seems like you'd be able to determine at least some of the time by static analysis. But I can also see that would be brittle, and if you fell back to do a dispatch per iteration, performance would tank.

moonchild
> semantics line up somewhat with GPU-friendly parallelism, but not entirely

An APL machine was proposed[0]. I say it is the GPUs that are wrong! :)

https://www.slac.stanford.edu/pubs/slacreports/reports07/sla...

moonchild
> If my (newly refreshed) understanding is correct, you can implement f⍣≡ as a per-thread fixpoint if you know that f is elementwise independent, which it seems like you'd be able to determine at least some of the time by static analysis

Hmm, yeah, that would work. Though you would also need to account for side effects in f.

> semantics line up somewhat with GPU-friendly parallelism, but not entirely

An APL machine was proposed[0]. I say it is the GPUs that are wrong! :)

https://www.slac.stanford.edu/pubs/slacreports/reports07/sla...

jodrellblank
Following is an attempt to explain +/((⌽⌈\⌽a)⌊⌈\a)-a

The functions ⌊ ⌈ find min and max values, and \ is functional programming scan operator. Combining ⌈\ gives max-scan which does this:

          twentyRandoms
    71 61 79 72 84 76 76 88 16 53 51 56 64 50 100 33 60 82 96 53

          ⌈\twentyRandoms
    71 71 79 79 84 84 84 88 88 88 88 88 88 88 100 100 100 100 100 100

moving to the right, the largest value (max) seen so far is kept and carried on.

Then ⌽ reverses the values in an array so this (⌽⌈\⌽a) does the same thing going from right to left, by reversing, max-scan, then reversing the result:

          twentyRandoms
    71 61 79 72 84 76 76 88 16 53 51 56 64 50 100 33 60 82 96 53

          (⌽⌈\⌽twentyRandoms)
    100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 96 96 96 96 53

Then the ⌊ in the middle compares the two results an item at a time, taking the minimum of each pair:

          (⌽⌈\⌽twentyRandoms)⌊⌈\twentyRandoms
    71 71 79 79 84 84 84 88 88 88 88 88 88 88 100 96 96 96 96 53
The (...a) - a part does item by item subtraction, each number in this result array minus the number in that position from the original array.

          ((⌽⌈\⌽twentyRandoms)⌊⌈\twentyRandoms)-twentyRandoms
    0 10 0 7 0 8 8 0 72 35 37 32 24 38 0 63 36 14 0 0
Then +/ sums them up like a sum() or foldr or reduce:

          +/((⌽⌈\⌽twentyRandoms)⌊⌈\twentyRandoms)-twentyRandoms
    384
Which, if I remember the talk and the code works, means that if you drew this array as a bar graph and poured water into the top middle of it, how much water would it hold before it all overflowed? The highest block on the left and the highest block on the right are the bounds of the highest points where the water can't get past. In between those, the bar heights are the depth at each position, and the water depth is from there up to the lower of the highest bar.
jazzyjackson
its interesting that ⌽ is used for reverse, it's phi, right? As the lesser is to the greater, the greater is to the whole? There's a vague notion of invertability to the golden ratio but I can't find the words for it. Maybe, a ratio that works in both directions.

Am I way off base? Maybe there are other meanings to the character I'm unaware of.

moonchild
The line is an axis of vertical symmetry in a 2-d context (circle); compare with ⊖, which shows an axis of horizontal symmetry. When applied to a matrix, ⌽ reverses rows and ⊖ reverses columns; the elements are mirrored around the pictured axis of symmetry. (In general, when applied to a high-rank array, ⊖ reverses the first axis of its argument, and ⌽ reverses the last.)
moonchild
(Axes of horizontal and vertical symmetry, respectively, of course; I don't know why I wrote that.)
jazzyjackson
o that's much less esoteric, thanks for the explanation
beagle3
There’s also a circle with a diagonal line through it (on phone, no Unicode keyboard) which consistently with the above axes of symmetry transposes a matrix; don’t remember if it transposes first two or last two axes on higher rank arrays.
moonchild
It reverses the order of the axes. There is also a dyadic variant, which provides a new order for the axes. Duplicating an axis is no problem; but there is a danger of leaving out an axis. So in j, instead, the specified axes are moved to the rear.
raphlinus
Yes! I believe being able to figure this out qualifies you as ⊢>+⌿÷≢
jodrellblank
Above average, haha. Permanent beginner, I think :)
Aug 17, 2021 · 1 points, 0 comments · submitted by tosh
Hi, the author here.

One of the hidden messages of the introduction is: watch Guy Steele's talks [1][2][3] if you are interested in data parallelism! These talks are not Julia-specific and the idea is applicable and very useful in any languages. My libraries are heavily inspired by these talks.

Of course, if you haven't used Julia yet, it'd be great if (data) parallelism gives you an excuse to try it out! It has a fantastic foundation for composable multi-threaded parallelism [4].

[1] How to Think about Parallel Programming: Not! https://www.infoq.com/presentations/Thinking-Parallel-Progra...

[2] Four Solutions to a Trivial Problem https://www.youtube.com/watch?v=ftcIcn8AmSY

[3] Organizing Functional Code for Parallel Execution; or, foldl and foldr Considered Slightly Harmful https://vimeo.com/6624203

[4] Announcing composable multi-threaded parallelism in Julia https://julialang.org/blog/2019/07/multithreading/

sroussey
Given your experience, what primitives would you advocate for JS for data parallelism?
tkf
For composable nested parallelism, I think spawn and sync like Julia has would be great. If there is a variant of spawn that asserts there is no concurrency is required for the child task, I'd imagine the runtime can do more optimizations (Julia doesn't have it but there was an experimental PR [1]). But concurrency sounds like a very basic thing in JavaScript so I'm not sure if it fits well in the language. But I don't have experience in serious JavaScript programming so I don't know what fits best there. I'd imagine you can build some data parallel libraries already with web workers API.

[1] https://github.com/JuliaLang/julia/pull/31086

nextos
I love Julia, and in many areas both the language and standard libraries are quite polished.

However, simple parallelism based on threads (e.g. a parallel map) surprisingly requires a third party library. ThreadsX, the one you posted about, is also my favorite option.

Besides, the standard pipe operator (|>) doesn't support partial application. This has lead to several third-party reimplementations via macros, transducers, etc. Since this is a basic feature, fragmentation is a bit worrying.

eigenspace
Just to echo Tkf here, I think it'd be great to have something like ThreadsX.map in base someday, but when the multi-threading features in Julia first came out, it was unclear what the best interface to provide would be.

Since Julia is committed to semantic versioning, this is a kinda scary prospect because it means that any high level, exported interface we decide on, we'll be stuck with until at least 2.0 and probably forever.

So with things like this, it's important to be conservative about the interfaces we expose and instead, we've have people explore different approaches out in the package ecosystem. One day, if someone can make a strong enough argument, I'm sure we'll see one of these package solutions end up in Base.

nextos
Yes, I agree getting design choices right is important in order not to end up with a lot of legacy code in future versions of Julia.

I think the pipe operator is probably the area that needs most urgent attention. Lots of libraries have their own slightly different macros for improved pipes with partial application, and that introduces a bit of fragmentation and hurts composability.

There's an open issue for this with some background: https://github.com/JuliaLang/julia/issues/5571

tkf
Yeah, I get the point that it's nice to have basic things in the core language and stdlib. I'm rather on the "minimal language core" side in this discussion and I think it'd be better to not start adding more things in Julia Base and stdlib. At least not right now, to avoid accumulating "dead batteries." There are some non-trivial things to figure out for a composable data-parallel interface. For example, what the data collection interface should be to support folds over them? I'm using a minimalistic interface defined in SplittablesBase.jl for this but it's not wildly used outside my packages. So I'm not super confident that it's enough yet.

[1] https://github.com/JuliaFolds/SplittablesBase.jl

Reminds me of Guy Steele tackling this trivial problem on stage: https://www.youtube.com/watch?v=ftcIcn8AmSY
associativity is the key to parallelism

Four Solutions to a Trivial Problem - Guy Steele Jr. https://www.youtube.com/watch?v=ftcIcn8AmSY (using a monoid-cached tree)

I enjoyed this talk along the same lines by Guy Steele, regarding not necessarily forcing a sequential solution to a non-sequential problem:

Four Solutions to a Trivial Problem - Guy Steele Jr.: https://www.youtube.com/watch?v=ftcIcn8AmSY

Look for steps that are commutative and/or associative in the problem to exploit parallelism.

hood_syntax
Monoids will save the world
AboutTheWhisles
They've been around for 30 years.
hood_syntax
Don't worry, it'll be any day now.
bordercases
They've been hard to communicate by amateurs
May 17, 2018 · 3 points, 0 comments · submitted by tosh
Oct 19, 2017 · 2 points, 0 comments · submitted by xfer
Aug 07, 2017 · 2 points, 0 comments · submitted by tosh
Guy Steele [1] sometimes mentions it. He gave an interesting Google Tech Talk called Four Solutions to a Trivial Problem [2], and at 1:59 [3] he said:

Also, algebraic properties are important. I've got a background in applied algebra, and I think that has informed my programming. And I think that making programmers aware of algebraic properties of their code, and communicating some of those properties to the compiler, may be worthwhile.

The language Fortress [4], which he was one of the language designers of, allowed one to explicitly provide the compiler with such information - you could say that a certain operation was distributive or associative for instance, and the compiler could then do some refactorings and optimizations taking this knowledge into account.

[1] https://en.wikipedia.org/wiki/Guy_L._Steele_Jr.

[2] https://youtu.be/ftcIcn8AmSY

[3] https://youtu.be/ftcIcn8AmSY?t=1m59s

[4] https://en.wikipedia.org/wiki/Fortress_(programming_language...

FullyFunctional
GHC Haskell (not standard Haskell) supports this and it's heavily used. Look for "Rewrite rules".
Feb 28, 2016 · 48 points, 8 comments · submitted by dang
stevebmark
I'm trying very hard to follow along but this quickly devolves into language that isn't easy to follow. When he starts going into combining "bitonic globs" is when I start having trouble focusing.

For someone that's familiar with parallel programming topics and language, is this talk easy to follow? I'm trying to figure out if the fault lies with myself or the presenter. The vibe I'm getting is that showing slides full of code in an obscure language isn't a good teaching method, but I also don't have the prerequisite background here so my opinion isn't useful. I also don't do work that involves these classes of problems.

bjornsing
> For someone that's familiar with parallel programming topics and language, is this talk easy to follow? I'm trying to figure out if the fault lies with myself or the presenter.

I think it was easy to follow if you focused on the aspects of general interest (= what he was doing in principle, e.g. divide-and-conquer). But sure, if you wanted to understand exactly how the merging of "bitonic globs" worked, then no, that's not easy to follow.

I remember thinking when he started talking about merging the "bitonic blobs" that "Yea, yea, sure, you can divide-and-conquer, but it's just about as much work combining two of those partial solutions as solving the original problem! There's got to a be a better way...", and I basically stopped listening to the details.

diskcat
I could see the gist of the procedure i.e. representing it as globs which allow you to break up the data into 2 so you can run it in binary tree like steps.

But the code was pretty useless. I didn't know that language and within the short time span that it was presented I couldn't understand the code.

Clearly this would be better presented as a paper so you could have time to go through it but I think if you ignore the actual code completely I still understood (or at least i think i do) the talk.

jtolmar
My key takeaway was that mathematical properties like associativity, commutativity, idempotency, and having an identity value are good tools for communicating with compilers. An associative binary operator plus an identity is enough to automatically turn your + into a Σ, with the compiler being able to choose a parallel or sequential implementation.

Guy's example code uses a language that looks like mathematical notation, but I think that if you could tell the compiler what's associative and what the identity value is, you could get there with procedural code provided you use for-each instead of classic for. There's also probably a better way to phrase the first line than "var x = operator.identity;" so that more natural code ends up parallelizable, but I don't know what it is.

bjornsing
I watched the whole thing and was a bit disappointed I must say. The TLDR (or W as in watch perhaps) is essentially if we write our programs in a map/reduce (functional) style fashion then the compiler/runtime can sometimes[1] choose between a parallel and a sequential implementation at runtime, depending on what computational resources are available. Seems fairly obvious to me... I would have much preferred the 10 minute version of this talk.

1. Given that the underlying operations are order and grouping independent (associative and commutative) of course.

_u3yg
The part where the guy chuckles and says that some of us could recognize this problem as part of their employment application at Google was a bit disturbing. It sounded innocent, but, ominous really.

Just assume for a second that I'm not a complete idiot. Assume that I've been around for awhile, that I know how to perform summations, that I can do running averages. Assume that I've seen complications that come with solving some of these basic problems. Assume that I've solved problems before only to be confronted by subsequent problems in the pipeline, and assume that I've come to expect that kind of development.

What in the world could my first impression mean to anyone?

deckar01
I think calling it a "map/reduce (functional) style" is an understatement. The presenter is suggesting that when a problem must be solved in parallel, don't just code the low level parallel implementation. Analyze the problem, divide-and-conquer, then use high level abstractions that make parallelism a responsibility of the underlying implementation.

This concept may be familiar to you, but talks like this will be necessary until parallelization is as seamless as garbage collection.

bjornsing
> Analyze the problem, divide-and-conquer, then use high level abstractions that make parallelism a responsibility of the underlying implementation.

That's a better summary than mine, but I still think it's too little substance for a 60 min talk. If you want a parallel(izable) implementation then divide-and-conquer is a well-known schoolbook approach. The problem as I see it is that it's not easy to build (or understand) algorithms like that. I saw very little in this talk about how to address/solve that, except the obvious: better education.

I think it was also a bit irritating that he pretended there was nothing a compiler could do about the accumulation loop he presented as the first example. Isn't it obvious that a smart compiler could (in principle) optimize that loop so that it could run in parallel on multiple cores...?

HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.