HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
What is backpropagation really doing? | Chapter 3, Deep learning

3Blue1Brown · Youtube · 645 HN points · 4 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention 3Blue1Brown's video "What is backpropagation really doing? | Chapter 3, Deep learning".
Youtube Summary
What's actually happening to a neural network as it learns?
Help fund future projects: https://www.patreon.com/3blue1brown
An equally valuable form of support is to simply share some of the videos.
Special thanks to these supporters: http://3b1b.co/nn3-thanks
Written/interactive form of this series: https://www.3blue1brown.com/topics/neural-networks

And by CrowdFlower: http://3b1b.co/crowdflower
Home page: https://www.3blue1brown.com/

The following video is sort of an appendix to this one. The main goal with the follow-on video is to show the connection between the visual walkthrough here, and the representation of these "nudges" in terms of partial derivatives that you will find when reading about backpropagation in other resources, like Michael Nielsen's book or Chis Olah's blog.

Video timeline:
0:00 - Introduction
0:23 - Recap
3:07 - Intuitive walkthrough example
9:33 - Stochastic gradient descent
12:28 - Final words
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Apr 02, 2021 · sxp on Backpropagation 101 (2020)
This has a lot of words, but not enough content. I recommend 3Blue1Brown's neural net videos instead: https://www.youtube.com/watch?v=Ilg3gGewQ5U
adamfaliq
This. And https://ml-cheatsheet.readthedocs.io/en/latest/backpropagati... . I am currently doing MIT Micromasters in Data Science, specifically the Machine Learning & Deep learning course. Last week's assignment was to build a neural network from scratch and these two resources have been a lifesaver.
While we're at this, I found the explainations by 3Brown1Blue to be very intuitive when it comes to neural networks, especially for folks who're new and don't necessarily grasp concepts when explained primarily through math.

What is backpropagation really doing? https://youtu.be/Ilg3gGewQ5U

His other videos on this topic are just as good.

I'm in the same boat. For long time, I was interested in AI but at the same time intimidated by math. I'm relatively comfortable with discrete mathematics and classical algorithms and at the same time calculus and linear algebra is completely foreign to me. Also, I do not accept way to learn ML without good understanding of core principles behind it. So math is a must.

A few months ago, I stumbled upon very amazing YouTube Channel 3Blue1Brown which explains math in very accessible way and at the same time I got feeling that I finally started understanding core ideas behind linear algebra and calculus.

Just recently he published 4 videos about deep neural networks:

https://www.youtube.com/watch?v=aircAruvnKk

https://www.youtube.com/watch?v=IHZwWFHWa-w

https://www.youtube.com/watch?v=Ilg3gGewQ5U

https://www.youtube.com/watch?v=tIeHLnjs5U8

So my fear of ML was gone away and I'm very excited to explore whole new world for neural networks and other things like support vector machines etc

bootcat
I have also used Mathematical monk, who was simple and good in introducing basic concepts and tools related to ML. https://www.youtube.com/user/mathematicalmonk
darethas
I also recommend taking up computer graphics for honing your skills in linear algebra. Graphics are essentially applied linear algebra.
markatkinson
I came here to write a similar comment. Really make sure to watch the playlists in the correct order on the above YouTube channel.
kregasaurusrex
Having watched the third one out of sequence, seeing the first two and then watching the third again helped me get a good understanding of the fundamentals. 3blue1brown as a narrator does a excellent job of allowing a rather tricky subject be more approachable, and inspired me to buy a course to allow a deeper dive into the math behind ML+NNs.
lerax
Nice to some one pointing the fundamentals. A good understanding about probabilist models is good too. After getting into too the basic math knowledge, I suggest this: https://classroom.udacity.com/courses/ud730
sn9
Regarding linear algebra, I highly recommend Klein's Coding the Matrix which uses Python to teach linear algebra.

I believe it was developed for Brown's linear algebra course for CS undergrads.

sova
Hi! That's wonderful. What' a support vector machine used for?
pletnes
Classification and regression. Given examples, predict labels or values for new data. SVM used to be more «hot» than neural nets and are still very useful.
hwu2whag
Wow thanks for this resource!
skytreader
Worth noting that 3Blue1Brown also did a series on linear algebra which is eye-opening to say the least. Playlist at:

https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2x...

Even if you think you grok matrices, have a go at the first few videos of that playlist, if just for the visualization. It really helped me see what matrices (and operations on matrices) represent!

tmaly
I just watched the first video. Thank for sharing.
_xhok
3Blue1Brown is a treasure. The production value is excellent, and he's great at taking seemingly uninteresting ideas and painting a beautiful picture to connect them in twenty minutes. I used to go through a video before falling asleep each night.
Nov 04, 2017 · 645 points, 99 comments · submitted by adamnemecek
adamnemecek
There needs to be some sort of organized push for visualization tools. I know, I might be bringing the proverbial owls to the proverbial Athens with saying that here, but I really do feel that if done right this could impact the course of the world like nothing else. This could be as important as idk, invention of book press or smth. Make computer "the visualization machine".

I think that one of the fundamental problems is that to be a visualization machine, you need to have easy access of the GPU and OpenGL is provides anything but. I think that shadertoy (shadertoy.com) is the thing that comes the closest but the learning curve is kinda steep.

I know that people like Alan Kay, Bret Victor or Michael Nielsen (his post was on the fp the other day https://news.ycombinator.com/item?id=15616637) share these sentiments but this is a task bigger than a single people.

Idk what I really mean by "organized push". I'm not sure if the problem is well defined too

seanmcdirmid
There was a big organized push for visualization and more precisely augmented visualized thinking at HARC. It’s really too bad HARC didn’t work out, but many of us are still very interested in this problem.
pas
Uh, details/links please on how/why HARC failed.
seanmcdirmid
It was just an unexpected funding problem, it’s not my place to say more than that. Also, some of the groups are still going, but not through HARC.
alfla
I agree. Visualization is often key to understanding and identifying non-trivial issues.

Here's a tool a colleague of mine made for inline "visual debugging" for e.g. computer vision, written in c++: https://github.com/lightbits/vdb. I haven't used it myself, but when he presented it I think it made a lot of sense to have these sorts of tools for processing data in real time.

posterboy
processing(.js, etc)
minimaxir
In deep learning, TensorBoard (https://www.tensorflow.org/get_started/summaries_and_tensorb...) works with TensorFlow and Keras to show what the model is doing. However, it ends up being more complicated/unintuitive than a YouTube video, so it's not as useful.
adamnemecek
The problem is that this is an ad hoc solution. What I'm talking about would be some formalization of visualization (I guess kinda like grammar of graphics without the statistical aspect) so you can visualize just about anything.
dlwdlw
I feel like visualizations rely too much on the existence of a meaningful isomorphism. That is, once a problem is visualized effectively it becomes trivial and though applicable to future similar problems the isomorphism itself is too domain specific to be generalized. It feels like trying to find an analogy that will help you find all future analogies.
adamnemecek
I do agree with this sentiment very much but at the same time I do feel like no one has really given it a good shake.
posterboy
Isn't that what category theory is about, on the meta level, and in the result in case of specific isomorphisms, too?

edit: at that I still have John C. Baez, Mike Stay - "Physics, Topology, Logic and Computation: A Rosetta Stone" on my reading list https://arxiv.org/abs/0903.0340

minimaxir
“Visualizing just about anything” isn’t helpful if you want to learn from the visualization, though. (c.f the /r/dataisbeautiful subreddit nowadays: https://www.reddit.com/r/dataisbeautiful/top/?sort=top&t=mon...)

That’s not to say that a purely artistic data visualization has no value, but it’s not academic. (I admit I am guilty of that at times)

adamnemecek
Data visualization is only a part of it. I'm talking about visualizing concepts.
posterboy
Structure is data
kharms
In my opinion this author produces the best math videos on youtube.

If you can afford it and enjoyed this video, consider supporting him on Patreon. https://www.patreon.com/3blue1brown

SonOfLilit
If you liked him you will love acko.net. Try https://acko.net/blog/how-to-fold-a-julia-fractal/
smortaz
If you enjoy his videos (and other creators'), please consider signing up on Patreon and supporting them.
aidos
Someone on here (I think) recommended his videos on linear algebra a while back and I've since watched them all, several times.

A couple of hours of watching time built an intuition and understanding of linear algebra and the broader maths around it that 4 years of university training didn't give me. That's a little unfair, because I obviously learnt a lot on the courses that make these videos easier to understand, but man, they're so well done.

NuclearFishin
Totally agree. Had exactly the same experience. Became a patron as a way to offer my thanks!
nouveaux
This was very timely for me and for anyone else learning, here are the first few videos of the series:

https://www.youtube.com/watch?v=aircAruvnKk

https://www.youtube.com/watch?v=IHZwWFHWa-w

https://www.youtube.com/watch?v=Ilg3gGewQ5U (Original video)

https://www.youtube.com/watch?v=tIeHLnjs5U8

quotemstr
The entire YouTube channel is fantastic. 3Blue1Brown's series on linear algebra is the best I've seen anywhere.
wybiral
Agreed. Pretty much every video on that channel is just as good as this one.
afarrell
A bit of a side-note, but I think it is an interesting piece of marketing that Amplify Partners decided to sponsor[1] the previous video in this series. I wonder (and hope) we'll see more VCs sponsoring open educational content relevant to their focus.

[1] https://www.youtube.com/watch?v=IHZwWFHWa-w&t=1205

timonoko
In 1988 Teuvo Kohonen had an "animation" with rotating disks how the perceptron learns https://youtu.be/Qy3h7kT3P5I?t=42m24s. Did not help comprehension much.
phkahler
What tools does a person use to make a video like this? I've been wanting to do the same on my topic of expertise for a while now.
henrikeh
He uses custom, self-developed tools

http://www.3blue1brown.com/about/

https://github.com/3b1b/manim

lelandbatey
He creates each animation using a set of Python tools and libraries he wrote. You can find them published here: https://github.com/3b1b/manim
edanm
He uses custom tools.

However, he actually recommends against using his tools. He suggests a better option is to use traditional animation tools.

I'm actually not sure what one would use for more traditional animations of his style though. I mean, theoretically you can use blender/etc for most 3d things, but how easy would it be to make something math-based there? Hopefully someone with some real animation experience can chime in.

mcintyre1994
On the Manim Github he has some suggestions: "For 9/10 math animation needs, you'd probably be better off using a more well-maintained tool, like matplotlib, mathematica or even going a non-programatic route with something like After Effects. I also happen to think the program "Grapher" built into osx is really great, and surprisingly versatile for many needs."
physicsyogi
I didn't know that Grapher was still around.
raverbashing
> I also happen to think the program "Grapher" built into osx is really great

I didn't know about this, and it's a nice find

pls2halp
There's a really cool story about how the software wasn't supposed to ship with macOS, but the devs got it on anyway: http://www.pacifict.com/Story/
phkahler
Is there any way to verify that story?
Micoloth
Great story, thank you for sharing! :D
perfmode
Each one of these videos consists of thousands of lines of code. The attention to detail is impressive.
posterboy
have you looked at visualizations done in mathematica? LoC is not a good measure here. http://community.wolfram.com/content?curTag=graphics%20and%2...
samueloph
oh my god, as soon as i saw this video was from 3Blue1Brown i immediately thought "this gonna be good!". I didn't realize he was posting a Deep Learning series.
crusso
I played around with ML a few years back. I took the Andrew Ng course on Coursera and spent some time with some python notebooks - but I never did anything with it beyond just proving that I could follow the examples and implement my own ML solutions for some simple training sets.

Now I have some problems I'd like to solve with ML. So assuming that I understand the basic concepts, what's the HN recommendation for a good library/system to get started with on doing some practical ML with neural nets?

Would TensorFlow be the best way to get into it?

ranman
Checkout MXNet as well
ranman
https://mxnet.incubator.apache.org/tutorials/
andreyk
Keras is probably the best library to get started - Ternsorflow is mainly needed if you need to mess with lower-level details, which for many things is not really necessary.
crusso
Would you still recommend Keras for non-image work?
oliv__
I dove into this not knowing anything about neural networks, but the feeling I came out of it with was incredible: I love it when something blurry and obscure slowly morphs into a sharper picture in your mind, it's so empowering.
mlamat
Thank you very much. I must code a neural network with backpropagation for my AI class. Can anyone recommend a book?
fnbr
If you're looking to understand the underlying theory behind deep learning, the Deep Learning book by Goodfellow et al. is awesome.

If you're interested in general machine learning, the Elements of Statistical Learning, by Tibsihirani et. al is great; a more applied book is Applied Statistical Learning by the same author. For a more applied view, I'd check out Tensorflow or PyTorch tutorials; there's no good book, as far as I'm aware, because the tech changes so quickly.

I've done a series of videos on how to do deep learning that might be useful; if you're interested, there's a link in my profile.

bmc7505
COMP 551?
Yreval
This book is a good practical introduction that walks you through the basic ideas as you develop some basic functionality. http://neuralnetworksanddeeplearning.com/

I'm often pretty skeptical of e-books and self publications, but the above link is pretty good (and the video series linked here references it as well.) The Goodfellow book that another commenter mentioned is a high-quality survey of the field and a nice, high-level overview of different research directions in deep learning, but isn't as pragmatic as an introduction.

alexasmyths
This video series is amazing and I wish it existed long ago.
cmatt01
There needs to be some sort of intense advocation for visualization tools.
i25959341
adsf
SonOfLilit
We need more people teaching math through visual intuition. Life a friend of mine said, "if you want to do computation fast, phrase it as a problem for your GPU, er, visual cortex".

Here is a tool you can play with to visualize this: http://playground.tensorflow.org

If you liked this video, try this different visual intuition of what a neural network does that I find even better:

http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

Also, remember that back propagation is a very general algorithm, it works not only on linear transformation weights but on any direct acyclic computation graph that is differentiable in its weights.

jfaucett
> We need more people teaching math through visual intuition.

I would modify that slightly and say rather just through "intuition". Visualization helps a lot, but you can also have great intuition from situations, stories, feelings, etc (anything that hits the non-reasoning part of the brain i.e. your "gut feeling"). IMHO one of the biggest problems in mathematics and science education is that we spend too much time working on things which humans are bad at (precise calculations) and far too little doing the 'rough estimation' and 'intuition' work which we have been evolutionarily optimized for and which is essential to us for actually remembering and understanding how things work.

tomjakubowski
I'm learning linear algebra and found that watching a Strang OCW lecture, internalizing it in "his voice" and then doing a few problems from his textbook book while "listening" to him (in my head) has helped my intuition more than anything else. Reading the book "in his voice" makes it easier to understand, too.
llamaz
I think that might be a placebo. Maybe you mean his attitude
posterboy
it stands to reason that language has a strong affinity to sound. Because, well, there's magnitudes more time in history to develop that compared to text-reading skill.
lottin
Visual intuition fails when you have more than 3 dimensions or 4 in some cases. I can visualise a right angle in 2 and 3 dimensions. In 4 dimensions I can't visualise it and the whole idea of an angle no longer seems to make sense. And yet I still need to use this notion of a right angle in more than 3 dimensions in order to reason about certain problems.
surrey-fringe
What percentage of teachers use visual intuition, and what percentage should?
phreeza
This can also go wrong, for example visualising probability distributions in low dimensions leads to very wrong intuitions about the behavior of high-dimensional dimensional distributions.
tankenmate
Here is a good short document[0] that can help unlock the missing visual intuition of high dimensional data. It does require a lot more thinking than 3d space, but eventually along side the maths it help you to get a "feel" for it.

[0] https://www.cs.cmu.edu/~venkatg/teaching/CStheory-infoage/ch...

zardo
That means you need to keep track of what properties actually hold under projection to lower dimensions.
smallnamespace
Keeping track of properties is not really something 'visualization' necessarily helps with though, more symbolic reasoning through proofs.
zardo
I don't mean that spatial reasoning helps with that, I mean that if you do it, you can still apply your spatial reasoning where it's appropriate.
dmritard96
I agree it isn't 100% perfect for every situation. But I can think of plenty of instances where ditching colors, lowering resolution, etc. have been totally fine (and often essential) for gaining a level of intuition. As some other comments noted, this intuition may be flawed, mostly because it's hard to know what you don't know. As a result, you are still more informed than you might have been before, but you also might not know much more.

however, being able to mix and match properties is really what most good plotting and visualization is all about. having done a bunch of ray tracing, my intuition around lighting and light is much better. I am not even good 'anechdata' so take that for what its worth, but I found visualization to be much more intuitive than reading e&m textbooks/lectures. I'm not sure if knowing a phenomena as bottom up or top down is really a guarantee (nor am I suggesting that symbolic reasoning is bottom up or down), but seeing something is just so efficient for some people. Like anything powerful, it just needs to be used judiciously and with asterisks.

adamnemecek
I'm not sure I know what you are talking about but let's not throw away the baby with the bath water.
decisiveness
I don't think knowing exactly what parent comment is talking about is required to see that they weren't suggesting we should do away with all visualizations just because there are some cases where they might not be the best tool for teaching.
kobeya
What he means is that we only really have intuition for 1, 2, and 2.5D visuals, but many areas of mathematics don’t map into low dimensions very well, or do but lose essential properties in the process. Building a low dimensional projection of he problem might prime intuition, but it will also introduce fundamental biases as well.

For example, learning geography by flat map projections only. No matter what projection you use there is a trade off, and you end up instilling both the pro and the con of that trade off as intuition.

darkmighty
I would reiterate that simply because there are problems with naive visualization, we shouldn't discredit visual thinking.

There are several key elements to effective visual thinking. The primary importance is to keep it grounded in proofs and theorems, so you know exactly what are your limitations. Often you can use a geometric argument on top of a few theorems and you get a very strong result intuitively, and then use this intuition with a tiny amount of algebra to prove it (which might take you forever to arrive from a purely algebraic perspective). Another key is that there are several ways of visualizing things. You can almost always transform a problem into an equivalent one that is easy to visualize (just need a little bit of care with the transformation, etc).

---

For example, you can show functions form a vector space, visualizing some interesting algebraic properties about them, even if it constitutes an infinite-dimensional space.

You can show several operators (such as d/dx) are linear, you can give it a norm, internal product, etc. This trick lets you use visual tools (and linear algebra tools) with arbitrary functions. You can visualize projection of a function into a subspace, or into some non-canonical basis -- yielding useful applications -- such as Fourier analysis.

Fourier analysis itself is a fertile ground for visual thinking. You'll be finding trivial arguments for seemingly difficult decisions such as "Does this linear system have a bounded output for any bounded input?". There isn't one right way of thinking about anything.

---

On the other hand, it can't be stressed enough the importance of keeping track of formal assumptions, axioms, definitions, theorems to construct valid, correct proofs. That way you minimize the risk of fooling yourself, and can safely use your intuition.

This 3B1B video exemplifies many of those elements:

https://www.youtube.com/watch?v=zwAD6dRSVyI&t=633s

phreeza
Yes, exactly, thanks.
adamnemecek
I didn't understand the distributions part.
chestervonwinch
It's related to the geometric problems the parent described because probability distributions roughly describe geometric regions (of high probability density) where observations are likely.
nitrogen
Part of it may be that in higher dimensions the bulk of a volume is concentrated near the surface.

I found https://blogs.msdn.microsoft.com/ericlippert/2005/05/13/high... with a quick search.

nkurz
The article here (and the links in the comments) might clarify the connection to machine learning:

http://www.penzba.co.uk/cgi-bin/PvsNP.py?SpikeySpheres

https://news.ycombinator.com/item?id=3995615

posterboy
interesting

the volume of an n-ball peaks at n=4 dimensions and quickly drops to zero around n=20. cf. https://news.ycombinator.com/item?id=3995930

The comment refers to lebesgue measure (I don't even what), but I'd intuitively and ignorantly assume we count all faces of all n-1 balls (recursively) whereas the Volumes overlap and so the total (in lebesgue ...space?) is less than the sum of it's parts (in euclidean space) - how far off am I? (will delete if too far)

posterboy
> learning geography by flat map projections only

well, the map is flat more or less at closer zoom levels, so the general problem seems to be purely about lossy compression.

kobeya
...what? I don’t understand. It’s accurate at “close zoom” because the limit as you scale in to the surface of a sphere is a flat surface. I’m not sure what compression of any sort has to do with this.
Retric
Flat map projections work fine if you provide enough of them.

A video from LEO or a rotating map projection provides very different intuition than a single static map. https://www.youtube.com/watch?v=EPyl1LgNtoQ

Very high zoom levels also work out nicely.

posterboy
yeah, as I ad-hoc-ly commented. It's all about compression.
vbuwivbiu
please elaborate - are you thinking of the curse of dimensionality ?
SonOfLilit
Related: Hamming's "The Art of Doing Science and Engineering" chapter 9, N-Dimensional Space

http://worrydream.com/refs/Hamming-TheArtOfDoingScienceAndEn...

(I assume Bret Victor has permission to host the PDF on his website, he is far from an anonymous pirate)

phreeza
There are many examples, one I came across recently is that the large majority of the probability mass of a high-dimensional gaussian distribution is in a shell at a distance from the mean, the mass at the center is actually quite low.

Also anything related to topology, which is important when you are looking at decision boundaries, becomes counterintuitive in high dimensions, because so many things can be adjacent at the same time.

alan-crowe
This is the problem that makes my brain melt when I try to think about genetics and mutational load. The naive idea is that a species, S, has a correct genome, G, but mutations build up, increasing with each generation. Presumably mutations build up until they are common enough that back-mutations are a thing. Then there is an equilibrium. In a fecund species, each individual has many children, but most have a higher mutational load, many have the save mutational load, and a lucky few have a smaller mutational load, closer to G, the correct genome. Differential reproductive success then maintains the equilibrium.

I don't see how the numbers are supposed to work out for large mammals, with each female having under a dozen offspring. To have a decent chance of a back-mutation, the typical member of the species would need one twelfth of their genome to be deleterious mutations.

Meanwhile, people are thinking about using CRISPR to correct the human genome, creating unusually happy, healthy people. The underlying thought is that the correct genome is best. But why do we think that the correct genome works at all?

Most of the population is in a shell at a distance from the correct genome, the number at the center is actually quite low. Given the combinatorics, with two to the millions of possible genomes, but populations in the millions, the number at the center, or even close, is actually zero. Maybe the correct genome codes for a sickly, miserable individual?

My current guess is that the evolution of large mammals with few offspring is constrained by genetic load considerations. It is not sufficient, (or even necessary) for the correct genome to be any good. There needs to be a big blob of mediocrity in genome space. The species exists as a shell of individuals on the edge of the blob of mediocrity. The blob needs to be huge, so that individuals whose genome is one twelfth mutations are still in the blob. Then there can be an equilibrium between back-mutations, taking offspring towards the interior of the blob and other mutations, taking offspring out of the blob and out of the gene pool.

This potentially solves the Fermi paradox. Can creatures such as humans actually exist in this universe? It is not enough for natural selection to discover a good genome. Natural selection has to discover a huge blob of mediocrity. Such blobs might be vastly rarer than we realize.

This potentially shits on the CRISPR master race. There might be nothing special about the interior of the huge blob of mediocrity.

canjobear
What is the "correct genome"? Seems like you could only define it as a local minimum in fitness space, or some kind of attractor.
alan-crowe
I don't know.There is medical perspective which focuses on deleterious mutations causing disease. This is a black-and-white perspective which sees mutations as either wholly bad or entirely unimportant. A genome without any deleterious mutations is a correct genome.

But what happens if you step back from black-and-white thinking and ask about mutations with ambiguous effects. Which is the mutation and which is the correct genome? It becomes unclear.

An alternative perspective asks: how well separated are the local minima in fitness space? Perhaps the typical separation is as large as the gaps between species. Then each species has only its own local minimum, which defines its correct genome. Or perhaps fitness space is littered with local minima, such that a single species has genetically healthy individuals in several different minima plus other individuals, perhaps not quite so healthy, nearby.

shas3
Theoretically, yes. Can you give a more concrete example? Many hard high dimensional and general topology problems can be visualized through their 2D special cases.
sevenfive
Yeah, but the distance from the center is itself just a one-dimensional gaussian...
jstanley
Can you please try and explain why that is?

If true, you're very correct that lower-dimensional intuition does not transfer into higher-dimensional spaces: my intuition tells me that a Gaussian distribution drops off as you fall away from the mean, and it's quite easy for me to imagine that in 2 dimensions, 3 dimensions (e.g. by imagining a mound on a plane) and 4 dimensions (e.g. a cloud in 3-space with increased density around the mean).

Is my intuition wrong in any of those cases? If so, why? If not, how many dimensions do we need before it becomes wrong?

smallnamespace
The center of the distribution always has the highest density, but the ratio of 'probability mass close to centroid' / 'total probability mass' drops off as number of dimensions grows.

This is somewhat related to another 'curse of dimensionality' observation, which is that the volume of a hyperball / volume of hyperspace tends towards zero as dimensions grow -- there's just a lot more volume that's in some sense 'far' from the center.

eli_gottlieb
>If true, you're very correct that lower-dimensional intuition does not transfer into higher-dimensional spaces: my intuition tells me that a Gaussian distribution drops off as you fall away from the mean, and it's quite easy for me to imagine that in 2 dimensions, 3 dimensions (e.g. by imagining a mound on a plane) and 4 dimensions (e.g. a cloud in 3-space with increased density around the mean).

Density is different from mass. Namely, mass is the integral of density. So your intuition is roughly correct for density, but you need to make it accord with a good intuition for mass.

Since getting the mass requires an integral, getting the mass over N-dimensional distributions requires integrating an N-dimensional region, which means N integrations for N dimensions. Each integration is, intuitively, a kind of sum. Integrating out many dimensions happens recursively; looped or recursive addition is multiplication. So on some level, to take the probability mass of a region in N-dimensional space, you need to "multiply" a density.

Since the total probability mass is fixed (1.0), adding more dimensions means you need to "multiply" the density by a larger number to get the mass, which means you need to divide the mass by a larger number to get the density, which means that despite the density peaking at the mean, the available density at any given point gets smaller as the dimensionality rises.

nabla9
> it's quite easy for me to imagine that in 2 dimensions

It starts to fail really badly when dimension grows.

Two simple examples:

1) Consider 3 dimensional unit sphere centered at origin and unit cube centered at origin. Cube is clearly completely inside the sphere. Now generalize to n-dimensions. Hyperdimensional volume of hypercube with side length 1 moves almost completely outside the n-sphere with radius 1 when n-grows.

2) Alternatively almost all volume of n-sphere is close to the surface.

These are all very counterintuitive, yet simple to check toy examples. When you start to integrate over more complex multidimensional function, things get weird really fast.

vbuwivbiu
isn't this just because we're comparing n-dimensional objects by a 2-norm ? i.e. the dimension of the space grows but we're keeping the dimension of the norm fixed, but if we used the p-norm of the same dimension as the space, then maybe that would return intuitive results ?
tzahola
>Alternatively almost all volume of n-sphere is close to the surface.

How does this go against intuition?

Intuition from 1/2/3d tells me that the volume of an N-ball is O(r^N), and indeed it is the case in higher dimensions. Therefore it’s easy to see that the difference between the volume of an N-ball of radius r and an N-ball of radius (r + epsilon) will grow exponentially with N.

orangecat
Because an outlier in any single dimension will put the point outside the "center" of the distribution, and as the number of dimensions increases there's more of a chance of that happening.

Say you have an N-dimensional gaussian where each dimension has mean 0 and standard deviation 1. Define the center as the N-dimensional cube whose edges go from -3 to +3 in each dimension. A normally distributed value is within 3 standard deviations of the mean with probability 0.9973, so the probability that an N-dimensional point being in the center is 0.9973^N. With N=4 that's 0.989 which matches your intuition, but at N=1000 it's 0.067 and at N=10000 it's 1.81e-12.

kahoon
I think this is a very good point. Years ago people were worried about gradient descent getting stuck at a local minima, plausibly because this problem is very obvious in a 3 dimensional space. In higher dimensions however this problem seems to go away more or less and a lot of worrying about the issue seems to be the result of lower dimensional intuitions wrongly extrapolated to higher dimensions.
bobby_the_whale
Can you write down your statement in a formal language such that we can prove the negative to your statement such that you might get the idea to stop talking about things you do not understand?
bovine3dom
Say you have N random walks.

The probability that the second derivative at any point is of the same sign for all walks decreases with N.

Right?

bobby_the_whale
You know the expression "not even wrong"? That's exactly what this is.

If you take the lim_{N->\inf} it's true, sure. Except that's a trivial result. We already have a lower existing upper bound available for solving neural network optimization problems.

david-gpu
Not the person you replied to, but your comment was both rude and incorrect enough that I feel the need to reply. See for example http://www.offconvex.org/2016/03/22/saddlepoints/ for some discussion on this.
bobby_the_whale
Funny you say that. I have the impression you can't even write down what correctness means.

If you have to refer to an external party that is also lacking in rigor, please don't.

EDIT: Have you even read that article yourself and in particular the NP-Hard bit? That directly contradicts the idea that you can escape from local minima. The only thing you can hope for is escaping from some local minima or that your problem actually was easy to begin with.

Computers have never in their history solved hard problems for non-trivial problem sizes, they have merely approximated them.

Neural networks have been used to solve problems of practical interest, but any extension of that claim just makes you look like a clown.

Eliezer
Stuff got stuck in local minima for years before we learned about stuff like momentum and dropout and dropped a ton of GPU power on it.
alexcnwy
I think you're misinterpreting the parent who is saying that local minima are not a problem in high dimensions because there is always a dimension to move in that reduces the loss (unlike in lower dimensions where you can get stuck in a point across all dimensions that cannot be locally improved upon)
llamaz
I still don't understand what the parent is talking about then. Could you please restate the explanation using math notation/terminology?
llamaz
When I was implementing a neural network for a university assignment (2 years ago so my memory might fail me), we had to run our algorithm multiple time with different starting positions, then take the minimum of those local minima.

I'm not sure what momentum and dropout are, but I agree with Eleizer, without these things (which I didn't use) local minima are a problem.

eat_veggies
Dropout is where you randomly remove neurons from your network during training, which prevents them from depending too much on specific neurons (making the output more generalizable). It was developed in 2014 so it would have been brand new tech back when you were in your class.
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.