HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
The Convex Geometry of Inverse Problems

Microsoft Research · Youtube · 87 HN points · 0 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Microsoft Research's video "The Convex Geometry of Inverse Problems".
Youtube Summary
Deducing the state or structure of a system from partial, noisy measurements is a fundamental task throughout the sciences and engineering. The resulting inverse problems are often ill-posed because there are fewer measurements available than the ambient dimension of the model to be estimated. In practice, however, many interesting signals or models contain few degrees of freedom relative to their ambient dimension: a small number of genes may constitute the signature of a disease, very few parameters may specify the correlation structure of a time series, or a sparse collection of geometric constraints may determine a sensor network configuration. Discovering, leveraging, or recognizing such low-dimensional structure plays an important role in making inverse problems well-posed. In this talk, I will propose a unified approach to transform notions of simplicity and latent low-dimensionality into convex penalty functions. This approach builds on the success of generalizing compressed sensing to matrix completion, and greatly extends the catalog of objects and structures that can be recovered from partial information. I will focus on a suite of data analysis algorithms designed to decompose general signals into sums of atoms from a simple---but not necessarily discrete---set. These algorithms are derived in an optimization framework that encompasses previous methods based on l1-norm minimization and nuclear norm minimization for recovering sparse vectors and low-rank matrices. I will provide sharp estimates of the number of generic measurements required for exact and robust estimation of a variety of structured models. I will then detail several example applications and describe how to scale the corresponding algorithms to massive data sets.
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Feb 01, 2017 · 87 points, 13 comments · submitted by espeed
digitalemble72
I think it's important to note that all of these methods have been largely superseded by deep learning techniques. For example, we can now directly learn algorithms such as gradient descent[1] and classical inverse problems like superresolution are solvable with deep networks [2]. While there still may be a role for tools like CVX, I anticipate all future progress will come from end-to-end differentiable systems.

[1] https://arxiv.org/abs/1606.04474 [2] https://arxiv.org/abs/1501.00092

TTPrograms
The attitude that "deep learning solves everything and we shouldn't bother with other techniques" is primarily one of laziness. There are many types of problems out there that call for many different types of approaches, but it's easier to just declare your favorite is best than it is to continue one's education and development.

I can think of reams of problems that convex or heavily-priored approaches are typically used for that are yet not even possible to connect to the machine-learning structure, yet you would claim that somehow deep learning has superseded fields it's not even connected to? This is unbridled arrogance.

jjgreen
Agreed, though sometimes "the man with a hammer" has just run out of ideas and then DL gives you an inefficient and expensive half-solution (which is better than nothing). Same thing happened with genetic algorithms etc., nothing new under the sun.
mturmon
Perhaps the above comment can provide an occasion for a useful interchange on the relation of DL to other approaches. I'll try:

Last year a student of one of the authors credited in the video performed an interesting data modeling/structure discovery analysis for me. The problem had ~100 variables and ~300 observations. We used a latent-variable + sparse-interaction model and solved the resulting minimization problem with a convex optimizer, as described in the video.

This approach was preferable to a DL technique for several reasons: we wanted to preserve some interpretability of the discovered latent variables (just 2 or 3 for our problem); we had reason to believe the problem had a sparse-linear structure because of conservation of mass; we didn't have much data relative to the number of variables.

I don't see DL approaches (despite their many successes) as applicable to this kind of problem.

The OP motivates his approach with recommender systems, where there are millions of outcome variables (customers), and thousands-to-millions of stimulus variables (products). That would result in a "lot" (considerable understatement) of connections for a DL model to learn.

Unless I'm missing something, the hyper-scale recommender setting is also not the best place for a conventional multi-layer DL model -- too many connections. On the other hand, the explicit sparsity control built in to the OP's optimization is really helpful.

kxyvr
Absolutely not. Certainly, if you believe this to be the case, you can go make a $1 billion selling this software to the oil industry since full wave inversion for seismic imaging problems, one example of an inverse problem, is still an open and very difficult problem.

ML and inverse problems solve two completely different problems. Really, ML should be called more accurately empirical modeling since we're creating a generic model from empirical data. In inverse problems, we choose an underlying model and then fit this model to our data. The difference between the two is that empirical models use generic models whereas inverse problems typically use models based on physical laws like continuum mechanics. Because of this, we often don't care about the end model in an inverse problem, but the variables that parameterize it because they have physical meaning. Generally, in ML, the parameters don't have physical meaning. The similarity between the two is that we're using an optimization engine to match some kind of model to our data.

To reiterate, in an inverse problem, the variables we solve for typically have a physical meaning. For example, in full wave inversion, we often model the problem using the elastic equations and we solve for the tensor that relates stress and strain. This is a 6x6 symmetric matrix (21 variables) at every point in a 3-D mesh. Side note, these meshes are large, so the resulting optimization problem has millions if not billions of variables. This matrix represents the material at that location. In this context, we don't need the end model with all these parameters plugged in. We're just going to look at the material directly because it tells us where stuff like where the oil is. In the context of the optimization, we will run the simulation, but, really, we don't need a simulator when we're done.

Now, in ML, imagine we did something simple like a multilayer perceptron. Yes, there are more complicated models, but it doesn't matter in this context. What is the physical meaning of the weight matrix and offsets? Saying it's the neurons in the brain is a lie. What if we're modeling acoustic data? Now, if we're just interested in creating a box that maps inputs to outputs, it doesn't matter. However, going back to the seismic world, mapping inputs to outputs just means mapping acoustic sources to travel times. No one cares about this. We want to know the rocks in the ground.

As such, ML and inverse problems use some similar machinery. Specifically, they both use optimization tools. However, they're used in very different places. Presentations like the one in the article are important because solving inverse problems at scale is really, really hard.

None
None
vladislav
As an expert in both the umbrella of methods known as compressed sensing and more recently deep learning, my opinion is that these techniques are largely complementary. Deep learning is fantastic at for instance building appearance models and exploiting hierarchical nature of natural images, and hierarchical feature building in general where appropriate, while methods from compressed sensing remain as quite useful general methods via which one can efficiently extract low dimensional structure, from noisy and at times even highly corrupted datasets.

It is true that deep learning is now state of the art in certain tasks such as super-resolution for natural images, which were previously the domain of linear inverse problems, and this is due to its ability to learn useful natural image priors, something that isn't possible with simpler linear or slightly non-linear models. Meanwhile, compressed sensing style methods still excel in situations where the data does not benefit from hierarchical compression, labeled data is not available and/or the data is highly corrupted. Take for example the Netflix challenge problem, discussed in the video, for which deep learning is unlikely to offer substantial benefits, at least for the problem as stated (we just observe partial information about movie ratings). Where deep learning could potentially help in that situation is for instance grouping movies according to high level semantic information derived from text descriptions, other metadata and even the video content of the films themselves, which are still somewhat open problems and would not necessarily add value, depending on the validity of the low rank assumption of movie preferences.

More recently studied problems such as phase retrieval, which are in a sense the most elementary non-linear inverse problems, have now been understood, and in fact have informed understanding of how information propagates in deep neural networks (http://yann.lecun.com/exdb/publis/pdf/bruna-icml-14.pdf). More generally, the study of favorable outcomes in non-convex optimization, which is informed by recent developments in the umbrella field of compressed sensing, will help drive understanding of what makes training deep neural networks possible and thus to improve it, with the current empirical performance of deep learning being far ahead of any theoretical understanding.

Broadly speaking, as opposed to fighting about the relevance of one field or the other, we should strive to achieve better overall results by using both sets of techniques complementarily.

mturmon
"Deep learning is fantastic at for instance building appearance models and exploiting hierarchical nature of natural images..."

Agreed. And contrast this success with the lack of success of first-principles latent-variable modeling for natural images. A lot of very good researchers spent decades building multi-layer probabilistic models for natural image structures - I'm thinking about the Grenander school, for example. The jury is still out (I think) on the ultimate value of that approach.

But for classification, it turns out to be much more tractable to use DL. You don't need all the semantic information the multi-layer model contains to tell a car from a truck.

As you say, it's better to view these approaches as complementary.

physPop
Largely superseded? This is a sarcastic comment, right?

In case it isn't: the assertion that these are superseded is categorically false. For any reasonably computationally difficult problem, being able to capture the structure of a problem is hugely powerful, rather than blindly throwing deep learning algorithms at it. Just becuase you can doesn't mean you should.

For example, algorithms for convex problems in particular can be orders of magnitude more efficient than naive nonlinear approaches. Also consider the case where the problem has some (possibly sparse) structure, where custom solvers can render trivial otherwise computationally intractable problems.

scotty79
I read about application of neural networks to fluid dynamics. It ended up being faster than usual approaches. At least some matemathical solutions might eventually be superseeded by pretrained nn at least in some contexts.
SmooL
I think the key was that the neural network didn't compute the fluid model accurately so much as a compute a simulation that looked accurate to humans
srean
Indeed. Few people talk about were the deep (or for that matter shallow) dirty laundry is. NNs require a fantastic amount of babysitting and trying out different configurations, the first time around for a specific dataset/task. Once done, you do get great results.
Shalhoub
Microsoft Research: 'Each year Microsoft Research hosts hundreds of influential speakers from around the world including leading scientists, renowned experts in technology, book authors, and leading academics and makes videos of these lectures freely available.'

It's a pity Microsoft doesn't expend energy in figuring out how to make opening an email attachment or clicking on a URL safe.

whorleater
Microsoft has over 100,000 employees. They're not pulling resources from other teams for Microsoft Research.
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.