HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Connections between physics and deep learning

MITCBMM · Youtube · 130 HN points · 6 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention MITCBMM's video "Connections between physics and deep learning".
Youtube Summary
Max Tegmark - MIT
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
An excellent video lecture on this by Max himself which is brilliant and very intuitive: https://youtu.be/5MdSE-N0bxs
Here's a related 2016 talk by Max Tegmark (second author) on connections between deep learning and physics:

https://www.youtube.com/watch?v=5MdSE-N0bxs

The gist of it is that physical data tends to have symmetries, and these symmetries make descriptions of the data very compressible into relatively small neural circuits. Random data does not have this property, and cannot be learned easily. Super fascinating.

3pt14159
In that case, the fact that our minds run on similar substrates is probably non-coincidental.
akvadrako
Indeed, the connections seem profound. It seems to be a general-purpose optimal algorithm for, well, optimisation. And that would explain why the universe, our brains and AIs all trend toward it.

It could also be just that intelligence tends to mirror the outside world, but that seems a bit arbitrary.

vanderZwan
> It could also be just that intelligence tends to mirror the outside world, but that seems a bit arbitrary.

We are part of the universe, so why would it be arbitrary if our brains were structured in ways that match typical structures found in this universe?

sitkack
https://en.wikipedia.org/wiki/Panpsychism
vanderZwan
I was thinking more along the lines that if our minds process outside information in a way that makes sense of that information, partially through simulating it, it does not seem so strange if the structures end up matching the outside structures through some form of convergent evolution.

From a cursory reading of that article I do not see it argue the same thing.

sitkack
That the universe is the best simulator of itself? That say, simulating water flowing through a pipe, the system doing the simulation re-formulates itself into something that resembles the pipe and the water?
vanderZwan
> That the universe is the best simulator of itself?

What is this even supposed to mean? Also, "pipe" and "water" are ridiculously high level constructs, categorisations made by humans. Neither says anything about structures inherent to the universe.

I mean that when working with symmetries, information flow, and fundamental building blocks, certain structures just tend to pop up naturally. Hence fractals and geometrically shapes in places where you might not expect them. Or how laws of thermodynamics suddenly seems to be everywhere in biology all of a sudden now that we started looking[0][1].

[0] http://ecologicalsociology.blogspot.de/2010/11/geoffrey-west...

[1] https://www.quantamagazine.org/a-thermodynamic-answer-to-why...

None
None
yodon
Thank you!!! as a Physics PhD that was one of the first videos I found on deep learning, and having no idea what a big deal his insights were I promptly forgot who the speaker was (remembering only it was a name I knew intimately from my time in Physics). Have frequently gone back to try to find this unsuccessfully.
My random stream-of-consciousness reference reactions ...

Tegmark & Lin's discussion of how deep learning maps onto the physical world: Why does deep and cheap learning work so well? [1][2]

The work of Aerts et al. more than 10 years ago, including how vector space models of human categorization show QM structure [3].

Lucien Hardy's exquisite teasing out of the difference between classical and quantum probability: Quantum Theory From Five Reasonable Axioms [4]

It turns out that the innovation, power and strangeness of QM is related to separating physical processes into:

1. Linear continuous unitary reversible evolution relying on complex amplitudes (wavefunction propagation).

2. Non-linear discontinuous irreversible 'collapse' or 'measurement' or 'entanglement' or 'correlated branching', with probabilities based on the Born Rule (real values from the square of wave function amplitudes).

Neural nets also decompose a problem into successive alternating layers of reversible continuous functions and discrete irreversible categorical decisions (e.g. softmax/sigmoid logistic classifiers).

An even more obscure tangent is that most of the financial market is now constructed from 'options', which track continuous values with a continuous payoff, but offer a choice of execution to collapse the option and create a real financial result, e.g. see N.N.Taleb's dicussion of optionality [5].

[1] http://arxiv.org/pdf/1608.08225.pdf

[2] http://www.youtube.com/watch?v=5MdSE-N0bxs

[3] http://en.wikipedia.org/wiki/Diederik_Aerts#Quantum_structur...

[4] http://arxiv.org/abs/quant-ph/0101012

[5] http://fooledbyrandomness.com/ConvexityScience.pdf

cs702
Thank you for posting this.

Similar thoughts have been percolating in my stream-of-consciousness for a while, especially after coming across an earlier version of Tegmark & Lin's paper some months ago.

My take is that deep neural nets work so well in practice because not all distributions of natural data are equally likely (and therefore, the no-free-lunch theorem(s), which assumes all distributions are equally likely, doesn't hold in the real world!); and that, in turn, is because the distribution of natural data is a consequence of the laws of Physics / symmetries of the universe in which we happen to live.

PS. You will enjoy the following papers too: "An exact mapping between the Variational Renormalization Group and Deep Learning" - https://arxiv.org/abs/1410.3831 ; and "Mutual Information, Neural Networks and the Renormalization Group" - https://arxiv.org/abs/1704.06279 .

I think the argument for using physics models for constraints in a DL system was a more clear argument https://www.youtube.com/watch?v=5MdSE-N0bxs
Nov 29, 2016 · 128 points, 4 comments · submitted by espeed
intjk
Max Tegmark! I love his book "Our Mathematical Universe". This video was a lot of fun to watch--I'll have to watch it a few more times before I understand it though :P
nickeleres
SO GOOD. really rare insight into the problem solving processes of top-level research physicists.
oneman
ahh, the metasystem reimplements itself
deepnotderp
Oh yeah, this paper was super fun :)

Refreshing departure from the total reliance upon the spin-glass model.

Oct 11, 2016 · 2 points, 1 comments · submitted by sdebrule
throwaway000002
Thanks for sharing this lecture by Max Tegmark.

He comments on the video linking to two papers on arXiv which relate to the material in the lecture: https://arxiv.org/abs/1608.08225 and https://arxiv.org/abs/1606.06737

According to a talk by Max Tegmark[0] (and its associated paper[1]), neural nets (particularly LSTMs) might be inherently better at this sort of thing due to the way they model mutual information.

Markov models are best suited to situations where an observation k-steps in the past gives exponentially less information about the present[2] (decaying according to something like λ^k for 0 <= λ < 1). Intuitively, the amount of context imparted by a word or phrase decays somewhat more slowly. That is, if I know the previous five words, I can make a good prediction about the next one, and likely the next one, and slightly less likely the one after that, whereas in a Markovian setting my confidence in my predictions should decay much more quickly.

So in answer to the grandparent, such a thing should be reasonably straightforward to build if it doesn't exist already, and it may offer improvements over a similar model based on Markov chains.

---

0. https://www.youtube.com/watch?v=5MdSE-N0bxs

1. https://arxiv.org/abs/1606.06737

2. Why is this? Lin & Tegmark offer details in the paper, but it comes from the fact that the singular values of the transition matrix are all less than or equal to one (an aperiodic & ergodic transition matrix has only one singular value equal to one), and so the other singular vectors fall away exponentially quickly, with the exponent's base being their corresponding singular value.

tfgg
It sounds like Tegmark is pointing out a pretty obvious and deliberately designed property of LSTMs... the entire point of them is to avoid exponentially decaying / exploding gradients and allow propagation of information over longer time-scales.
You should watch Max Tegmark's talk "Connections between Physics and Deep Learning" if this interests you [1].

Additionally, a paper that has everyone excited about deep connections between the mathematical analysis of physical systems and the hierarchical feature learning paradigm speaks of the connection in terms of the Renormalization Group [2].

Regardless of it's practical utility, the philosophical implications do tickle the intellect. On a dreamy note, I wonder if it would be possible to draw Category-theoretic parallels between some physical theories and statistical learning theory. There is so much to learn, and I am trying my best to teach myself (on the side) the mathematics that they don't teach in my CS grad school. So much to learn, such little time. :)

[1] https://youtu.be/5MdSE-N0bxs

[2] http://arxiv.org/abs/1410.3831

HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.