Hacker News Comments on "BayLearn15-Keynote3" Bay Learn Youtube Video

Rankings: this week · month (mar/apr) · year (2024) · all time

digests · search

Hacker News Comments on
BayLearn15-Keynote3

Bay Learn · Youtube · 385 HN points · 3 HN comments

HN Theater has aggregated all Hacker News stories and comments that mention Bay Learn's video "BayLearn15-Keynote3".

Youtube Summary

Jeff Dean, Google
Large-Scale Deep Learning for Intelligent Computer Systems

HN Theater Rankings

This course is unranked · view top recommended courses

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.

⬐

Nov 13, 2015 · amaks on DMTK: Distributed Machine Learning Toolkit from Microsoft

Here Jeff Dean explains why there is no central parameter server in Tensorflow: https://youtu.be/90-S1M7Ny_o?t=28m59s

⬐

Nov 11, 2015 · donthateyo on TensorFlow Benchmarks

Until now, I've seen two responses to Google's TensorFlow from Facebook employees. Yann Le Cunn seemed to really challenge Jeff Dean about TensorFlow's scalability [1] and this benchmark puts TensorFlow down there in all the measures it tested for. I can't ignore the possibility that this criticism of TensorFlow from Facebook employees (while factually correct and constructive) might be driven by some competition and jealousy.
[1] https://www.youtube.com/watch?v=90-S1M7Ny_o&t=39m

⬐ kastnerkyle
This benchmark basically shows that releasing TensorFlow with cudnn v2 backend support hurts - v2 is quite a bit slower than v3 (current) and v4 (upcoming). TF has announced that they will update to v4 support, which should help quite a bit - but when many hobbyists and researchers are developing on one or two GPUs performance on that scale is more important (for them) than infinite scalability.
It is not surprising that a tool developed and focused on `Google scale` work has some imperfections in a wildly different setting. The question is - will they (or some dedicated contributor) speed up a use case the business itself may not have a use for? My gut feeling is that they will, but these things usually don`t happen overnight. Torch, Theano, and Caffe have years of work put into them and have largely been focused on the one to two GPU case.

⬐ varelse
You might be seeing the final gasps of Google's longstanding and now-reversed anti-GPU stance in TF's initial GPU performance.
NVIDIA support alone will make sure TF knocks it out of the park down the road. IMO it's crazy to consider the first release the final say on TF's GPU performance. Caffe, Torch, and Theano have a huge head start (and lots of pre-existing technical support from NVIDIA).
The biggest limitation I see right now is that their multi-GPU algorithms are really simple and inefficient. That will change I'm sure now that they're getting benchmarked against everyone else.

⬐ kastnerkyle
I don't think anyone is claiming TF won't get much, much faster (probably very soon). But claiming that right now - today - TF kills existing toolkits like Torch, Caffe, and Theano (which I have seen here and elsewhere - though not claiming you are in this camp) is a bit premature.
Even with v4 support which is coming soon, the general setup that users have is one to two GPUs in one machine. This means adding the right inplace operations can have a huge impact on performance, and is probably a usecase that Google has not focused on given their internal infrastructure. I am sure they will normalize to "at or slightly above" the performance of other toolkits, but the question is when?
This benchmark doesn't have anything to do with multi-GPU as far as I am aware - these are single machine, single GPU results. I would wager 90% of the deep learning hobbyist and research communities run in this setting, so benchmarking this is really important.
For people with huge amounts of networked and distributed resources - distributed support will be very amazing!

⬐ varelse
Inplace is a nice late phase optimization, but the root problem here appears to be that Google's convolution kernels are crap compared to those in cuDNN3 and Neon (I asked Scott Gray about this directly and I trust his wisdom).
No surprises there whatsoever. The tensorflow engine is the most gold-plated POS I've seen in a long time. if it were running on 1000+ servers, I'd get the level of overengineering they've applied here. But single server pthreads? WTF?
Also parameter server is dumb unless you sweat the implementation of the gather/reduction ops, but I digress.

⬐ blazespin
Do we know if this is a just a reference implementation or what Google uses in production? My guess the numbers will come down pretty quick. From what I can see from mxnet versus theano versus torch is that GPUs are really the final determinant of speed and not the framework.

⬐ smhx
I've done an apples to apples, TensorFlow + CuDNN R2 vs Torch + CuDNN R2

⬐ kastnerkyle
My point is more that even if they were on the same footing from benchmark timings, v2 is still far behind what is supported in Torch, Caffe, and Theano right now (v3 in all IIRC). Your comparison is very fair, and it is good insight!

⬐ argonaut
I think I'll give the benefit of the doubt to, you know, the pioneer of deep learning, inventor of convolutional neural networks, and (co)-inventor of the backpropagation algorithm.

⬐ discardorama
Just because you invent an algorithm, it doesn't mean you know how to implement it in the most efficient way possible.
I think the key to TensorFlow is not how fast it runs on 1 machine; but how fast it runs on 10,000.
Consider map-reduce (Hadoop). Sure, you can sort 1GB data on a single machine 10x faster (using /usr/bin/sort) than using Hadoop on that machine; but make the data 1TB and add 1000 machines, now lets see how fast you can sort with /usr/bin/sort!

⬐ argonaut
I'm just pointing out how ridiculous it is to cast aspersions on the judgment of one of the world's leading experts on deep learning.
It's also getting out of hand because everyone has already decided TensorFlow must be amazing, so everyone is extolling the virtues of what right now we only know to be speculation and PR.

⬐ hueving
>I'm just pointing out how ridiculous it is to cast aspersions on the judgment of one of the world's leading experts on deep learning.
A position of authority doesn't mean the person is immune to jealousy. Someone in a high status position is going to be much more likely to have a knee-jerk defensive reaction to news that threatens their image.

⬐ kastnerkyle
Yann LeCun has been working on hardware implementations of convnets almost since the beginning. LeNet 5 (the check reader of ATT, circa early 90s) needed a dedicated and specialized hardware implementation IIRC. It could be kneejerk, but he definitely has the background to make a fair assessment - and this benchmark seems to back his claims.

⬐ rryan
> and this benchmark seems to back his claims.
These benchmarks evaluate single-node performance. LeCun's remarks were concerning distributed training (specifically that bandwidth between machines is a limiting factor to scalability) -- which we can't test yet since the current version of TF is single-node only.
Dean's response in the video "it depends on your [computer] network" is an interesting response :).

⬐ kastnerkyle
Both Google and FB (from what I understand) have a ton of tricks to help this, but I expect Yann was speaking generally even with all these tricks, and he is right from what I have seen. The old paper on DistBelief talks about topping out at ~80 machines due to network overhead - it would be great if they talk more about distributed TF in an upcoming paper.
If TF really has a way to make general, networked, distributed training efficient (more than 1 bit weight updates, low precision weights and all the other crazy tricks which already exist) - that is truly remarkable and they rightly deserve huge kudos.
If they are faster in distributed training only on Google machines or with Google's network architecture, that isn't really a useful datapoint for the general public.
We could wire everything with 10GB/s NiCs, change to jumbo frames, pull all these other bandwidth reducing tricks, trick out the Linux kernel, etc. and then your network probably won't matter - but that isn't really general or cheap. The key will be what is the minimum effort necessary to avoid network bottlenecking, and does TF improve that minimum level over existing solutions?

⬐ mtw
Scenarios of machine learning on 1 machine is way more common than machine learning on 10,000 machines! I doubt even Google would have 10,000 for google translate
It's more realistic to discuss librairies for 1 machine / GPU

⬐ rspeer
> make the data 1TB and add 1000 machines, now lets see how fast you can sort with /usr/bin/sort!
Still faster than Hadoop, because sorting the data will be faster than sending it over a network. What should I do with the other 999 machines?
(I know what you actually mean, by the way. Make it 100 TB and say that the data is already spread out across 1000 machines.)

⬐ smhx
I'm the one who runs the benchmarks. It's sad that you put such a twist to the whole thing.
I've been running convnet-benchmarks forever now, and I've been running them independently on separate personal hardware. I do this as a hobby.
I've done an apples-to-apples comparison, and my benchmark review only puts the facts forward, I dont attack them.
If you read my other social media comments, I've been pretty positive about TensorFlow, and I've even put in some groundwork to write Torch bindings for it.
Please stop spewing nonsense interpolated from like 2 super-weak data points.

Jeff Dean explains TensorFlow

Nov 10, 2015 · 2 points, 0 comments · submitted by dennybritz

Jeff Dean explains TensorFlow [video]

⬐

Nov 09, 2015 · 383 points, 62 comments · submitted by quantisan

⬐ narrator
FYI, Jeff Dean is the inventor of most of Google's distributed computing infrastructure including MapReduce. Definitely up there with the likes of John Carmack and Fabrice Bellard as one of the great software engineers of all time.

⬐ ycosynot
This is the MapReduce white paper, from Jeffrey Dean and Sanjay Ghemawat, 2004, if people are interested. http://research.google.com/archive/mapreduce.html

⬐ fmela
The Jeff Dean Facts are worth reading: https://www.quora.com/What-are-all-the-Jeff-Dean-facts
Personal favorite: "Jeff Dean once shifted a bit so hard, it ended up on another computer."

⬐ None
None

⬐ geoffroy
thanks for the link, it's really funny

⬐ vijayr
"Jeff Dean's resume lists the things he hasn't done; it's shorter that way."

⬐ lgas
I'd wager that way more people know who Jeff Dean is than know who Fabrice Bellard is.

⬐ kybernetikos
Maybe some people do, personally I'm more familiar with the works of Fabrice Bellard, and that's probably fair enough since I expect I have more pieces of software derived from Bellard projects installed on my machine than I do software derived from Jeff Dean projects (not to criticise Jeff Dean, who I'm sure is very inspirational).

⬐ harperlee
I just wanted to do a +1 here, but as it is not very HN-worthy, I'll provide a little bit more of work, and just say that a search in HNSearch for "Fabrice Bellard" produced 3 pages of results for tha last year, whereas "Jeff Dean" only produced 2.
Personally, I feel I've been more exposed to Fabrice Bellard's work, which might be not true, but I first learned of Jeff Dean's existence yesterday.

⬐ shostack
Is there anything an "early" programmer like myself can do to play around with this stuff without a background in the related math?
I'm dying for this stuff to be dumbed down enough where Joe WebUser can feed in arbitrary data in a csv or point an app at a data source and get some sort of meaningful results.
It truly seems like an area where once the barrier to entry is greatly reduced, the creativity of laymen will lead to some truly amazing executions.
⬐ nl
You can get away without Math now, but it's now it still a pretty steep learning curve.
I'd suggest http://karpathy.github.io/2015/05/21/rnn-effectiveness/ is a good place to start.
The other option is using nVidia Digits toolkit.

⬐ T-A
Maybe try some of cloud-based machine learning offerings like
https://studio.azureml.net/
http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercl...
https://cloud.google.com/prediction/docs

⬐ chm
    I'm dying for this stuff to be dumbed down enough[...]
It kind of already is. Have you read the docs/examples? I don't think your mentality is fruitful. Having argued with people who shared your point of view, it seems there will always be something too difficult that prevents them from being good at X.
There's no substitute for sweat. Have fun with the code they gave you and see where you end up!
⬐ shostack
I think we have different definitions of what "dumbed down" means.
I never said I could/would never put in the work to learn it. I'm saying that the place it is in right now is still too advanced for someone with my background to pick up and play around with without sitting down to seriously study the underlying concepts that are objectively fairly dense subject matter that can require advanced math and CS backgrounds.
To be clear, I'm not advocating that everything should be dumbed down for the sake of it. My point was largely that when the barrier to creation gets low enough, more creative types that don't have the heavy technical backgrounds can pick it up and create things that more technical users may never have imagined.
Not everything is best served as remaining elusively complex for the layman.
Also, for the record, I will probably read up on some of this stuff because I find it interesting and enjoy learning. I just wish it was a step more accessible than it is today, even with this development.

⬐ chm
Hi Shostack,
I wasn't trying to bring you down. Maybe what you're looking for is a visual programming environment, where you can drag and drop functions, data, etc?

⬐ shostack
Not sure I even need a visual programming environment as I'm comfortable using a command line and hacking stuff together in Sublime.
However I am highly visual and visualizing the impact on the results would be really helpful. I deal with a lot of analytics and data as part of my day-to-day managing digital media. I often find that I can easily spot trends just by glancing at data visualizations, and infer insights from them.
Further, being able to visualize the nature of the functions/data/etc. would also be very helpful. I tend to need to visualize something to fully grok it.
If you have any suggestions for a more visual take on machine learning that is beginner friendly, I'd love a link.

⬐ codyguy
I agree with Shostack. He is talking about removing friction, easing out the learning curve.
The command line make command on GNU/Linux is an example of something that "dumbs down"/makes easy a quick start , as opposed to editing and configuring the Makefile yourself. Similarly, yum/apt-get take this "dumb down" one step further.
Nothing wrong with removing friction. Infact this is an idea for a startup right here, remove friction from machine learning/NLP/API.
That is why I responded to shostack in the first place. The response was specific to his question and I got plenty of downvotes on my karma. No worries there :)

⬐ cafebeen
It's probably best to understand what you're doing if you use something like this. I'd recommend at least going through an intro ML book/course to learn the basics first.

⬐ shammo
any suggestions on good intro ML books?

⬐ cafebeen
These are both good options:
http://www.amazon.com/Pattern-Recognition-Learning-Informati... http://www.amazon.com/The-Elements-Statistical-Learning-Pred...

⬐ codyguy
Hi shostack, You can try out ThatNeedle API. http://www.thatneedle.com/nlp-api.html . While it is only catering to retail for now. The underlying capability is broader and generic. If you have specific needs, let's get in touch and discuss. You should at least take a look at the gif demo on the site.

⬐ shostack
That's pretty cool, I'll play around with it. Thanks for sharing!

⬐ codyguy
You are welcome!

⬐ scoot
I recently asked the same question [1], and got several helpful replies.
I found this [2] very approachable, and you can find the material [3] and code [4] from the talks on github.
[1] https://news.ycombinator.com/item?id=10457439 [2] https://www.youtube.com/watch?v=r4bRUvvlaBw [3] https://github.com/scipy-conference/scipy2013_talks [4] https://github.com/jakevdp/sklearn_scipy2013
⬐ vonnik
TensorFlow looks amazing, and the Google team deserves huge kudos for open-sourcing it. It may well become one of the best supported OS DL frameworks out there.
People have been asking about its fundamental differentiators. I'm not sure there are any. Theano and Torch already set a pretty high standard.
We know what good tools look like, and those tools exist even if they're getting incremental improvements.
Now it's just a matter of building really cool things with them.

⬐ oiduts
This is from ~22nd of October, and he is being asked whether it is an internal thing or going to be released - his answer is that it is internal and there are no plans to release it. Did they change their mind in a couple of days (not likely)? Was he not in the loop (also unlikely)? What else?

⬐ ozgung
Maybe touching to lips gesture is a body language sign of not telling the exact truth:
https://youtu.be/90-S1M7Ny_o?t=2043
He says "I don't have anything to announce" so technically not lying.

⬐ amelius
It would be great if one could automatically dispatch this to a commercial cluster. So you could say: I want this network to be trained in 1 day, and the system would say: that would cost $X, and it would instantiate some AWS/Azure/Google instances, and run the task.

⬐ thearn4
> TensorFlow™ is an open source software library for numerical computation using data flow graphs.
> ...
> Gradient based machine learning algorithms will benefit from TensorFlow's automatic differentiation capabilities. As a TensorFlow user, you define the computational architecture of your predictive model, combine that with your objective function, and just add data -- TensorFlow handles computing the derivatives for you.
Interesting, it kind of look like the machine learning focused version of NASA's OpenMDAO (also a graph-based analysis and optimization framework with derivatives, but for engineering design).

⬐ Kiro
Jeff Dean has an amazing resume. He designed and implemented MapReduce, BigTable and much more.
OT but how much does a super engineer like him get paid at Google?

⬐ EwanToo
Largely it'll come down to how much money he wants.
His salarly will probably be in the 6-figures, but he'll be a millionaire many times over. He joined Google in 1999 (IPO was in 2004), so his stock will have made him a very rich man.

⬐ argonaut
Way off. He's a Senior Google Fellow and his bio used to be on the executive leadership page of Google (I can't find that page any more). He is being paid on the order of 10MM per year, base, easily.

⬐ EwanToo
Perhaps, I doubt his salary is that high, but he'll have significant amounts of of stock grants, options, etc.
Executive compensation is a very odd area, as I said, it pretty much depends how much money he wants.

⬐ argonaut
Yeah I should have been more clear. But I'm pretty sure his equity alone is more than $30M / year.

⬐ alttab
This was a great watch. It made me want to apply to the residency program. Edit: wasn't sarcasm.

⬐ None
None

⬐ codyguy
Yay!! An open sourced voice engine in the future? That would really shake things up.

⬐ IshKebab
No.
a) Stuff similar to this has been available for ages and there are no (good) open source voice recognition packages.
b) It requires absolute mountains of training data which we don't have.
c) It requires designing a suitable network, which I'm not sure if we have, but I would doubt it.
d) It requires training a network on the mountains of training data using an immense computing cluster, which we requires money that we don't have.
Don't hold your breath.

⬐ swah
Can anyone explain why was this downvoted?

⬐ codyguy
I agree there. Plenty to be done still.

⬐ ambiate
Sometimes creativity is the only thing holding people back from exploiting the natural insects of the web.
Case in point, ever wonder why those captchas include street addresses or 'pick the shape with a hole in it?' Spoiler: you're building training data and validating training data.
How else can we silently retrieve training data?

⬐ None
None

⬐ houselync
holle!!!!!!!

⬐ theCricketer
love u guys

⬐ mmytest
I like this link

⬐ hooloovoo_zoo
Who cares? Gradient descent is bandwidth limited not software limited. (edit: for ANNs)

⬐ Houshalter
Only for some applications, I think.

⬐ argonaut
Just wanted to repost this from the other thread on TensorFlow, since I joined the party a bit late:
I think some of the raving that's going on is unwarranted. This is a very nice, very well put together library with a great landing page. It might eventually displace Torch and Theano as the standard toolkits for deep learning. It looks like it might offer performance / portability improvements. But it does not do anything fundamentally different from what has already been done for many years with Theano and Torch (which are standard toolkits for expressing computations, usually for building neural nets) and other libraries. It is not a game-changer or a spectacular moment in history as some people seem to believe.

⬐ mailshanx
I think the fundamental differentiator might be how "production ready" TensorFlow is - the promise of simply declaring an ML pipeline and have that run in very heterogeneous compute environments from mobile phones to GPU blades to plain-old clusters, if fulfilled, can indeed be a huge game changer. The promise is that you literally do not have to write any new code when you are done with a research project / a series of experiments and are ready to deploy your pipeline at a large scale. None of Theano / Torch etc make that promise.

⬐ mtw
Torch is used in production by Facebook , deepmind and possibly baidu, amongst others. Facebook especially have released code to make torch run much faster on AWS with GPU cards. Also no startup time. The design is done with a high level language (lua) while computation done mostly in C. I'd be very surprised if tensorflow is actually faster than torch on a single machine

⬐ AMEDICALRe
Torch has extremely difficult learning curve due to Lua. With Tensor Flow underlying engine in C++ it is likely as efficient as Torch. Special extension such as Nvidia Cudnn could also be used with tensorflow.

⬐ argonaut
If somebody finds learning Lua to be difficult, then learning C++, learning basic statistics, or learning machine learning will be impossible for them.

⬐ pritambaral
What do you mean by "diffcult learning curve due to Lua"? Lua isn't difficult. Lua is easier than JavaScript!
Or perhaps you mean to say that Torch itself is difficult to learn because of design choices that were made in order to use Lua?

⬐ olooney
Learning an entirely new language for a project, no matter how simple that language is, is certainly a barrier to entry.

⬐ catwell
If you have already used another dynamic, imperative language, you can probably learn enough Lua to use Torch effectively in 30 minutes.
Seriously, there are three features in the Lua language which are not trivial: metatables, coroutines and _ENV. None of those are needed to use Torch.
It will take more time to learn Torch-specific APIs, but the same problem exists with the other ML frameworks.

⬐ argonaut
That's not really a fundamental differentiator. Torch/Theano are definitely production ready. I think the portability is definitely an advantage, though.

⬐ albertzeyer
I think you would not use Theano in an end-user product. It's made for developers to run on developer machines. It's very fragile. It has a very long start-up time, might be in the order of several minutes at the first start.
Maybe it would work good enough in a service backend. But even there it would not scale that well. For example, it doesn't support multi-threading (running a theano.function from multiple threads at the same time).

⬐ argonaut
Good point. Torch, then.

⬐ tachyonbeam
I'm curious how the performance and scalability compares with Theano and Torch. I'm thinking the reason Google built this is that they wanted to scale computations to really large clusters (thousands, tens of thousands of machines) and the other options didn't really cut it.

⬐ miket
Here's a page with various benchmarks: https://github.com/soumith/convnet-benchmarks
An issue has been created to add TensorFlow to this shortly.

⬐ lacksconfidence
This looks to be a single machine test, where this video and the poster above specifically talked about running against compute clusters. I don't think a single machine benchmark is going to be nearly as interesting.

⬐ smhx
They didn't release the multi-machine version of TensorFlow. They said they're still working on it and will release it when it's ready.

⬐ AMEDICALRe
As someone who had to go through pain of using caffe, struggled with steep learning curve of Lua/Torch and frustrated by lack of simple features (train on gpu/test on cpu) of Theano. You could not be more wrong. Having a well written, consistent and portable system is a huge plus. Tensor Flow is likely to do to deep learning what Hadoop did to Big Data.

⬐ None
None

⬐ kadder
its probably another efficient library, but its good to have another baseline to compare things

⬐ ossreality
>I think some of the raving that's going on is unwarranted. This is a very nice, very well put together library with a great landing page. It might eventually displace Torch and Theano as the standard toolkits for deep learning. It looks like it might offer performance / portability improvements. But it does not do anything fundamentally different from what has already been done for many years with Theano and Torch (which are standard toolkits for expressing computations, usually for building neural nets) and other libraries.
I don't know the first thing about TensorFlow, Torch or Theano.... but that was an awful, awful way tot convince me not to be excited about TensorFlow. Like, the worst possible way.

⬐ adrianbg
Well, it looks way more scalable than Theano or Torch while being as easy to use as Theano. I'd say that's pretty exciting considering the number of groups working on way lower-level scalable neural nets.
This is "not a game-changer" in the same way map-reduce isn't a game-changer wrt for loops.
Also check out TensorBoard, their visualization tool (animation halfway down the page):
http://googleresearch.blogspot.com/2015/11/tensorflow-google...

⬐ eva1984
Only the single machine version is open sourced.

⬐ albertzeyer
At the moment. They are working on making the distributed version available too.
https://github.com/tensorflow/tensorflow/issues/23

⬐ jamesblonde
Hope to make the distributed version available. That's not a promise. Google outsourced Blaze, their distributed build system, as Bazel earlier this year. However, no sign of a distributed version there yet. As a result, Bazel has had almost no adoption.

Hacker News Comments on BayLearn15-Keynote3

Hacker News Stories and Comments

Hacker News Comments on
BayLearn15-Keynote3