HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
NYC Tech Talk Series: How Google Backs Up the Internet

Google TechTalks · Youtube · 64 HN points · 11 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Google TechTalks's video "NYC Tech Talk Series: How Google Backs Up the Internet".
Youtube Summary
Google Tech Talk
October 22, 2013
(more info below)
Presented by Raymond Blum

ABSTRACT

Systems like GMail and Picasa keep massive amounts of data in the cloud, all of which has to be constantly backed up to prepare for the inevitable. Typical backup and recovery techniques don't scale, so Google has devised new methods for securing unprecedented volumes of data against every type of failure.

There are many unique challenges, both obvious and subtle, in delivering storage systems at this scale; we'll discuss these and their solutions as well as some alternatives that didn't make the grade.

About the speaker: Raymond Blum leads a team of Site Reliability Engineers charged with keeping Google's and its users' data safe and durable. Prior to coming to Google he was the IT director for a hedge fund after spending a few lifetimes developing systems at HBO and on Wall Street. In his meager spare time he indulges his interests in robotics and home automation and reads too much science fiction.
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
It would appear the Google backs up the internet on tape: https://www.youtube.com/watch?v=eNliOm9NtCM

Or at least did at one time.

fishnchips
It probably still does. I was on the gTape SRE team until 2014 and we had lots and lots of tapes and tape libraries back then, most of them giant beasts with 8 robots each. With the capacity of new LTO generations constantly growing and the existing investment in hardware and software it would be unusual to discard that.
wglb
One thing that I found very useful (not to forget the use of RAID on tape!) was the way to delete an entire user's data was to encrypt everything with a per-user key, and upon request to delete, the key would be securely destroyed.
fishnchips
Ah yes, it's a standard practice https://en.wikipedia.org/wiki/Crypto-shredding
Jun 16, 2021 · twotwotwo on Unreliability at Scale
The "petabyte for a century" problem statement he mentions near the top is fun: can you preserve 1 PB with better-than-even odds of all the bits coming back right in a century? He wrote something about how he saw the durability situation in 2010: https://queue.acm.org/detail.cfm?id=1866298 . He seems to define the problem to allow for maintenance, e.g. check-and-repair operations are discussed.

Broadening a little, I read it as "if you're trying to be more cautious than you usually want for commercial/academic online storage or even backup, what do you do? And would it work?"

A lot (but not all) of the author's Queue piece talks about stats from online storage, which doesn't have some wins you can get if you're entirely about long-term durability.

In online systems, heavy ECC seems out of favor compared to replication for performance reasons, but additional LDPC or RS at the app layer can absorb a substantial % of your volumes having problems, or a ton of random bad blocks. (In a more near-line context, Backblaze uses RS: https://www.backblaze.com/blog/vault-cloud-storage-architect... ) Same for tape -- offline LTO is slow and drives are costly, but the cost/TB and the rated lifespan seem like advantages over HDDs for this specific goal, at least in a narrow engineering sense.

A pile of cryptographic hashes that fits on one device can help you check for and localize errors without sending everything over a network/doing the full ECC dance. If the hash being broken and data tampered with is in your threat model (hey, weird things can happen in a century), you can also hash with a secret nonce you keep separate.

Initially loading a PB with a good chance of no mistakes is a thing too, and durability measures don't totally address that. Maybe your multi-site strategy loads up the original on different hardware at different locations with independent software implementations, and you compare those hashes after you do it.

With all that the hundred-years part is still deeply tricky in a couple ways.

"Lasting a hundred years" is just technologically different from "very low error rate at 5 years." Widely used media like LTO tape seem to max out at a 30y rating, and exotic archive media like the "hardened film" at GitHub's code vault has the big disadvantage of no ecosystem to read it. So seems like you really want refreshes of some sort at intervals, and being sure a task will be done decades from now is hard (assuming high-tech civilization is around and all that--some things just have to be outside the scope of the problem for it to be meaningful).

From that angle, maybe having an online copy of the data is a better investment than the pure engineering perspective would suggest: if other folks can grab a copy of the archive it has a better chance of outliving your organization.

Two, a lot of unknown unknowns crop up at that timescale. A couple decades back we didn't have the experience at scale we do now, and CEEs and other causes of SDC were less on anyone's radar. We could discover something else significant after it's too late to fix. The world can also change in ways that disrupt the durability picture substantially (changing laws or disaster risks, say), short of the types of change that make the whole problem meaningless.

Anyway, fun question, and if you find it fun too, you might like https://www.youtube.com/watch?v=eNliOm9NtCM , a talk on backups from someone at Google that talks about the tape restore after the big GMail glitch and various dimensions of resilience. And I'm sure there are storage papers and Long Now-ish stuff I'm not plugged into about things like this, wouldn't mind hearing about it.

FWIW...

"Backups are a tax you pay for the luxury of restore"

"How Google Backs Up the Internet" http://www.youtube.com/watch?v=eNliOm9NtCM Detailed notes: http://highscalability.com/blog/2014/2/3/how-google-backs-up...

Sep 22, 2017 · 1 points, 0 comments · submitted by wglb
No it isn't google takes your data to tapes as well as offline long term storage.

[1]https://www.youtube.com/watch?v=eNliOm9NtCM

mkj
They could be deleting encryption keys to the tapes? All speculation.
puzzle
There was a talk about the backup infrastructure. The speaker talked about the issue of keys, but didn't provide specific details:

http://highscalability.com/blog/2014/2/3/how-google-backs-up...

I also remember that talk. I haven't verified by watching again, but if anyone's interested I suspect this is the one: https://youtu.be/eNliOm9NtCM
This should serve as warning to anyone relying blindly on cheap cloud to keep themselves up and running.

This also remind me of importance of offline tape backup. Google still uses tapes[1], yet some of the companies have started thinking they can eliminate tapes and move their backups to cloud.

[1]https://www.youtube.com/watch?v=eNliOm9NtCM

https://www.youtube.com/watch?v=eNliOm9NtCM

How Google backs up the internet.

At the time it changed how I thought about backups/reliability.

Sep 08, 2015 · 1 points, 0 comments · submitted by sargun
"How Google Backs Up the Internet"

https://www.youtube.com/watch?v=eNliOm9NtCM

throwawayaway
claims google play music doesn't use deduplication for legal reasons. holy shit.
There may occasional bugs, but I don't know of any cases when Gmail actually lost emails. According to this talk [0], the bug referred to by that article caused some users' emails to be temporarily inaccessible, but all the emails were eventually recovered.

[0]: https://www.youtube.com/watch?v=eNliOm9NtCM#t=1734

Propogation of a software mistake is what appears to have caused the Gmail outage of 2011 http://youtu.be/eNliOm9NtCM?t=28m49s
Feb 02, 2014 · 62 points, 3 comments · submitted by wmf
ozh
Meh, video. Any TL;DW link?
thrownaway2424
This would be a lot better if the slides were given separately from the video. Anyone have a link to them?
jlgaddis
I'd be interested also. I did some searches but came up empty.
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.