HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Google I/O 2014 - Keynote

Google Developers · Youtube · 14 HN points · 5 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Google Developers's video "Google I/O 2014 - Keynote".
Youtube Summary
This morning we welcomed 6,000 developers to our 7th annual Google I/O developer conference. The crowd in San Francisco was joined by millions more watching on the livestream and 597 I/O Extended events, in 90+ countries on six continents.

We're meeting at an exciting time for Google, and for our developer community. There are now one billion of you around the world who use an Android device. One billion. We estimate that's more than 20 billion text messages sent every day. 1.5 trillion steps taken with an Android. And more importantly, a roughly estimated 93M selfies.

Today, developers got a preview of our most ambitious Android release yet. With more than 5,000 new APIs (for non-techies, that stands for application programming interfaces) and a new, consistent design approach called material design, we're continuing to evolve the Android platform so developers can bring to life even more beautiful, engaging mobile experiences.

But, beyond the mobile phone, many of us are increasingly surrounded by a range of screens throughout the day--at home, at work, in the car, or even on our wrist. So, we got to thinking: how do we invest more in our two popular, open platforms—Android and Chrome—to make it easier for you to easily and intuitively move from your phone, tablet, laptop to your TV, car or even your watch?

For more information visit http://goo.gl/p5rMZv

Watch all Google I/O 2014 videos at: g.co/io14videos
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Sundar Pichai gave some numbers during the I/O keynote, saying that 67 of the top 100 startups, 58% of the Fortune 500 and 72 of the top 100 universities have "gone Google": https://www.youtube.com/watch?feature=player_detailpage&v=wt...

I'm not sure what that means exactly and the numbers certainly encompass more than companies using Drive as a replacement of Office, but I found them suprisingly high. He also mentions a bit before that Drive now has 190 millions 30-day active users.

jamesjguthrie
Probably includes Google Apps, i.e. e-mail, etc.
SoapSeller
30-day active users is somewhat problematic for a service like Drive. I am accessing(reviewing) on weekly basis some docs that other people have uploaded to Drive and shared with me. So I'm considered "active" user while I don't actively using it.
timothya
To be fair, that sounds pretty active to me. Maybe not actively using it to create documents, but still using it often.
Yes, it's like Spark (http://spark.apache.org/) and SparkStreaming (http://spark.apache.org/streaming/) combined.

Here are the relevant papers...

* FlumeJava (iterative, data-parallel pipelines like Spark): http://pages.cs.wisc.edu/~akella/CS838/F12/838-CloudPapers/F...

* MillWheel (fault-tolerant stream processing like SparkStreaming): http://research.google.com/pubs/pub41378.html

Pointers to the IO blog posts...

* "Reimagining developer productivity and data analytics in the cloud" http://googlecloudplatform.blogspot.com/2014/06/reimagining-...

* "Sneak peek: Google Cloud Dataflow, a Cloud-native data processing service" http://googlecloudplatform.blogspot.com/2014/06/sneak-peek-g...

The Dataflow-specific talks at Google IO 2014...

* Big data, the Cloud way: Accelerated and simplified https://www.youtube.com/watch?v=Y0Z58YQSXv0

* The dawn of "Fast Data" https://www.youtube.com/watch?v=TnLiEWglqHk

* Predicting the future with the Google Cloud Platform https://www.youtube.com/watch?v=YyvvxFeADh8

* Keynote (starts at Urs Hölzle's segment on Google Cloud) https://www.youtube.com/watch?v=wtLJPvx7-ys#t=6932

jey
Cool. Does this mean Google is moving away toward languages that allow for easier use and serialization of closures than in C++ and Java? (For example, Spark uses Scala natively.)
espeed
Dataflow is language agnostic. The Java API is being released first, and more languages will follow.
You may want to correct Urs Hölzle, Senior VP of Technical Infrastructure at Google, then or at least tell him to choose his words better.

From today's I/O keynote video https://www.youtube.com/watch?v=wtLJPvx7-ys#t=9454

This is the exact quote:

    "... and today even when you use map-reduce, which we invented over a decade ago, it's still cumbersome to write and maintain analytics pipelines, and if you want streaming analytics you are out of luck. And in most systems once you have more than a few petabytes they kind of break down. So we've done analytics at scale for awhile and we've learned a few things. FOR ONE, WE DON'T REALLY USE MAP-REDUCE ANYMORE. It's great for simple jobs but it gets too cumbersome as you build pipelines, and everything is an analytics pipeline."
emphasis mine

Of course the word "really" in the middle of the sentence gives semantic wiggle room, but it's still a pretty big statement.

gaius
I am pretty sure that Google didn't invent map-reduce, which has been around since the 1970s at least.

This guy may work for Google, but he's a clown.

seanmcdirmid
How many big data jobs were being processed by MapReduce in the 70s, 80s, early 90s? Ya, that's right: none. Sanjay and Jeff were the first to apply the combination of map-shuffle-and-reduce as we know it today to big data processing.

Also, Urs Holzle is not a clown.

dbc1012
I don't know about Mr Holzle but you're wrong about map/reduce. I'm aware of two significant counterexamples. I'm sure there are others.

Teradata's been doing map/reduce in their proprietary DBC 1012 AMP clusters since the 80's, providing analytical data warehousing for some of the world's largest companies[1]. Walmart used them to globally optimize their inventory.

MPI systems have been supporting distributed map/reduce operations since the early 90's (see MPI_REDUCE[2]).

1- http://www.cs.rutgers.edu/~rmartin/teaching/fall99/papers/te...

2- http://www.mpi-forum.org/docs/mpi-1.0/mpi-10.ps

walshemj
what does falsely claiming that google invented MR make him then ?
gaius
I see the Google fanboys and wannabes are out in force on this thread.
seanmcdirmid
I see the crazies are out trying to redefine MapReduce as just being map and reduce and completely missing the point. But whatever, they've probably never seen big data loads and are definitely not involved in the industry.
gaius
Ooh, scary big data.

I could run your workloads in Excel without breaking a sweat. But go on kidding yourself.

seanmcdirmid
I don't think Excel scales to 10 or 100 TB of data.
gaius
In all seriousness tho', I was running data sets that big in Oracle, in 2006. You can see why I don't take "big data" seriously.
ithkuil
There's certainly a hype around big data nowadays, often even up to the point of being ridiculous.

The point is that people are starting to use this term to describe something that it's not even technical anymore, let alone describe the actual amount of data: merely using data to drive decision making.

This is not a new thing [0], yet there is a clear trend that shows how this kind of dependency is shifting from being auxiliary to being generative; some of the reasons are:

1. cheaper computing and storage power

2. increased computing literacy among scientists and not.

3. increased availability of digitalised content in many areas that capture human behaviour.

When there's request, there's opportunity for business. One thing that is new and big about Big Data is the market. It should be called "Big Market (of data)".

It's an overloaded term. IMHO it's counterproductive to let the hype around Big Data as a business term pollute the discussion about what contribution Google and others have made in the field of computer science and data processing.

So what did Google really invent? Obviously the name and concept behind MapReduce wasn't new. Nor the fact that they did start to process large amounts of data.

Size and growth are two key factors here. Although it's possible that the NIH syndrome affected Google, it's possible that existing products just weren't able to solve those two requirements. It's difficult to tell exactly how large given that the Google is not very keen at releasing numbers, although it's possible to find some announcements like [1] "Google processed about 24 petabytes of data per day in 2009".

20P is 10000 times more that 200 T. Stop to think a moment what does 10000 mean. It's enough to completely change the problem, almost any problem. A room full of people becomes a metropolis; an US annual low wage salary becomes 100 million dollars, more than the annual spending of Palau [2]. Well, it's silly to make those comparison, but it's hard to think about anything that scaled by 10000 doesn't change profoundly. Hell, this absurdly long post is well under 10k!

To stay in the realm of computer science, processor performance didn't increase by a factor of 10000 since PDP-11 from 1978 to Xeon from 2005 [3].

Working at that scale poses unique problems, and that's where real the contributions to the advancement of the field made by the engineers and the engineering culture at Google are placed. If anything, just knowing it's possible and having some accounts on what they focused on is inspiring.

This is the Big Data I care about. It's not about fanboyism. It's cool, it's real, it's rare. Arguing who invented the map reduce mechanics is like arguing that hierarchical filesystems where already there hence any progress made in that area by countless engineers is just trivial.

[0] Historical perspective: James Gleick , http://en.wikipedia.org/wiki/The_Information:_A_History,_a_T...

[1] http://dl.acm.org/citation.cfm?doid=1327452.1327492

[2] https://www.cia.gov/library/publications/the-world-factbook/...

[3] http://www.cs.columbia.edu/~sedwards/classes/2012/3827-sprin...

rbanffy
What was big data in the 70s, 80s and 90s? We just didn't call it map-reduce at the time.
gaius
"Big data" is not a thing, and neither is "the cloud", while I'm here.
seanmcdirmid
Well, then, you really don't understand the value of their contribution, which you have in your mind is just "map" and "reduce."
walshemj
British Telecom used map reduce in billing systems for the dialcom (telecom gold) platform in the 80's - that was on the largest (non black) prime minicomputer site in the UK.

Back then 17x 750's would be roughly the same as one the 5k plus clusters that yahoo etal use.

We even sold the system to NZ telecom

seanmcdirmid
Interesting. What kind of distributed file system were they using?
supermatt
A dfs isn't a requirement for map/reduce.
seanmcdirmid
From http://en.wikipedia.org/wiki/MapReduce:

> MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster.

...

> The "MapReduce System" (also called "infrastructure" or "framework") orchestrates by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance.

...

> The name MapReduce originally referred to the proprietary Google technology but has since been genericized.

So it would be quite impossible to have a MapReduce system without distributed computing infrastructure; even if you were doing mapping and reducing, it wouldn't be MapReduce.

supermatt
I see no mention of a distributed file system there. Local storage is not a requirement of distributed processing.
1stop
How do you do distributed processing without a distributed filesystem? Do you mean you'd load the filesystem into memory and send it to the "processors"?
walshemj
I our case the first stage synced up all the required file systems and applied all the required updates before kicking off the mapper stage.
walshemj
effectively yes each worker machine had an identical copy of the required ISAM files which where kept in sync by our system.

We had to build a lot of the functionality that comes out of the box in more modern system like hadoop

supermatt
The data could be stored on a network device, such as a file server or database, for example. It could indeed be local, but it needn't be distributed.

In the example GP gave, the data could possibly have been stored in a database queried using segmentation via consistent hashing (a basic way to distribute jobs across a known number of workers).

walshemj
Databases we should be so lucky :-) this was old school ISAM files updated with Fortran 77 and 4 different log files all with multiple types of records.

Our "Mappers" did quite a lot of work compared most modern map functions

srean
...defeating the entire purpose: of large scale parallelism on commodity machines. OTOH if you have a way of achieving order 500X parallelism with a centralized commodity server or database, I would love to hear.

EDIT @supermatt Ah I see, we differ in the definition then, to me it isnt bigdata/largescale unless it churns through big amounts of stored data. Bitcoin mining is no where in the ball park of this, its an append only log of solutions computed in parallel.

nl
I've seen MapReduce done against fairly significant amounts of data stored (10s of TBs per run) on a SAN running over fibre. The compute nodes weren't particularly cheap either - I guess they were commodity machines, but quite a long way from the "cheapest possible" things Google uses.

But it was still useful: it was a good computing model for letting as many compute nodes as possible process data.

That might not be what Google was trying to achieve, but it's difficult to argue that it isn't MapReduce.

supermatt
How on earth do you think bitcoin mining pools work (as an extremely trivial example). They coordinate ranges between a number of workers. The stored size of those ranges is miniscule in comparison to the data of the hashes between those ranges calculated on each 'miner'. These 'coordinators' absolutely work as a centralised 'commodity' storage server (or database) resource for 500x+ parallelism.

'Big Data' means 'Big Data', not 'Big Storage'. They are completely different things.

seanmcdirmid
Big data doesn't mean big computation, it actually means big data on lots of disks across many nodes. They are completely different things.

You might be into HPC, but that's not what Sanjay and Jeff did. HPC and big data loads are quite different.

supermatt
The bitcoin example may be a bit oversimplified, and may indeed lean more towards HPC. The example was intended to illustrate data locality (as per the parent question), not the actual computation.

Big Data may incorporate data from various 3rd party, remote, local, or even random sources. For example, testing whether URLs in a search engines index are currently available. This may be a map/reduce job, it may utilize a local source of urls, but it will also incorporate a remote check of the url.

As I said a few links up: DFS is not a requirement for map/reduce.

seanmcdirmid
All MapReduce frameworks I know about today are built on DFSs. There are definitely plenty of frameworks that support map and reduce that don't (e.g. MPI), but these aren't systems based on what was described in the OSDI 2004 paper where the word MapReduce was introduced.

I guess people just fixate on the terms map and reduce when the focus of MapReduce really was....shuffle.

supermatt
I think the problem is that we are talking about two different things.

The very start of the paper describes the term and it's methodology (which is what we are discussing), and then goes on to explain googles own implementation using GFS (which you seem to be getting hung up on.)

seanmcdirmid
Keep in mind that this whole thread is about "MapReduce", which Holzle was talking about, not the more generic map and reduce that has been around since the 1800s (and they will continue to mapping and reducing in their new dataflow framework, they just won't be using MapReduce). Now for the paper:

> Our abstraction is inspired by the map and reduce primitives present in Lisp and many other functional languages.

Inspired doesn't mean equivalent.

> Our use of a functional model with user specified map and reduce operations allows us to parallelize large computations easily and to use re-execution as the primary mechanism for fault tolerance.

They are using map and reduce as a tool to get something else.

> The major contributions of this work are a simple and powerful interface that enables automatic parallelization and distribution of large-scale computations, combined with an implementation of this interface that achieves high performance on large clusters of commodity PCs.

They are very specific about what the contribution is. All work that has claimed to be an implementation of MapReduce has followed their core tenants. Even if MPI has a reduce function, it is not MapReduce because it is based on other techniques.

I'm really tired of people who claim there is nothing new or even significant when there clearly was. Ya, everything is built on something these days, but so what? In the systems community, MapReduce has been a huge advance, and now we are moving on (at least for streaming).

supermatt
I'm still in the camp of there being nothing new here. Now gfs may be a different matter, but that was part of a different paper, and not a requirement of this one. Which is why I have kept stating that a dfs is not a requirement.
seanmcdirmid
If that's what you believe, then you are going to miss out on the last 10 or so years of systems research and improvements. And when Google stops using MapReduce but the new thing still uses map and reduce, you are going to be kind of confused.
None
None
walshemj
we used the normal file system (primes probably descended from ITS) and had a load of JCL written in CPL (prime JCL) language to sync every thing up over our Cambridge ring to two sites.

(we had oxford street dug up for our 10MBs link)

gdy
>"FOR ONE, WE DON'T REALLY USE MAP-REDUCE ANYMORE" And this is said in the context of talking about streaming analytics.
jbigelow76
But Urs also said, paraphrasing this time, that once you get into petabytes of information everything pretty much becomes streaming analytics.

Since I would assume that any non-trivial service that Google provides is in that petabyte neighborhood it explains why he would say that Google isn't using MR anymore.

Jun 25, 2014 · kyrra on Google I/O 2014
Direct link to the Keynote youtube stream: https://www.youtube.com/watch?v=wtLJPvx7-ys
dublinben
Oddly, their live streaming does not work without Adobe Flash. It absolutely refuses to load anything in Chromium, even though it supports every single HTML5 format that YouTube offers. I would have expected better from Google in 2014.
micampe
I’m watching it in Safari without Flash installed.
rockdoe
I'm guessing the problem is that it's encoded in H264, which isn't available in Chromium due to patents.
keeperofdakeys
Chromium uses ffmpeg to decode h264. In fact, they even include ffmepg in the chromium source tarball.
dublinben
Chromium can definitely play H.264 with no problem. All three videos on this page[0] play perfectly, as do all other HTML5 YouTube videos.[1]

[0]http://ie.microsoft.com/testdrive/graphics/videoformatsuppor... [1]https://www.youtube.com/html5 - This shows 6/6

rockdoe
I'm a bit shocked that this works even on Debian. I don't understand how it is possible without Debian violating the patents.
toomuchtodo
HTML5 doesn't support live streaming [1]. You need either Flash or HLS support (i.e. iOS/Safari).

Disclaimer: I work in the video space.

[1] http://stackoverflow.com/questions/21921790/best-approach-to...

x0054
Why hasn't HLS caught on more broadly? From what I know it's not licensed / patented by apple. Many Flash players implement it, and overall its a great way of delivering video over HTTP.
makomk
It appears to have interoperability issues. The Apple livestreams certainly never seem to play reliably in non-Apple HLS clients.
toomuchtodo
Its encumbered by being an Apple standard (NIH). Adobe is pushing HDS, and Google is pushing an open standard called MPEG-DASH.

They all do the same thing fundamentally; the only differences are slight technical details and who is in control of the standard.

dublinben
Live streaming is possible with WebRTC, as sites like Appear.in demonstrate. It might not literally be "HTML5" but it isn't Flash.
chris_mahan
We need http-streaming in html5 VLC supports it, why not the browsers?
joezydeco
Aereo (RIP) streamed live television over HTML5.
toomuchtodo
Let's not write them off yet. As I've posted in the other HN thread about them, they just need to pivot.
0x006A
You can do WebM live streaming.
rockdoe
HTML5 doesn't support live streaming

That's a very bad way of stating it. It's perfectly possible, and "supported" even though there is no "use this thing" solution for it. The problem is that you can't rely on a full solution being usable by all browsers you want to target.

Jun 25, 2014 · 6 points, 2 comments · submitted by devgutt
natch
Live with no audio.
devgutt
audio restored
Jun 25, 2014 · 8 points, 0 comments · submitted by turing
None
None
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.