Hacker News Comments on "Our data is GONE... Again" Linus Tech Tips Youtube Video

Rankings: this week · month (mar/apr) · year (2024) · all time

digests · search

Hacker News Comments on
Our data is GONE... Again

Linus Tech Tips · Youtube · 15 HN points · 2 HN comments

HN Theater has aggregated all Hacker News stories and comments that mention Linus Tech Tips's video "Our data is GONE... Again".

Youtube Summary

Configure your own workstation at https://lambdalabs.com/linus

Check out Hetzner Cloud and use code LTT22 for $20 off at http://linustechtips.hetzner.com/en/cloud-usa/

It's been a long time since we've had any serious data loss, but on this episode, we're discussing a software misconfiguration that has resulted in us losing an unknown amount of data on our petabyte project storage clusters.

Discuss on the forum: https://linustechtips.com/topic/1407799-our-data-is-gone-again/

Check out 45Drives at the links below
Website: https://lmg.gg/eGo2K
YouTube: https://lmg.gg/6ModQ

Buy Seagate 20TB Exos Drives
On Amazon: https://geni.us/WFfs
On Newegg: https://geni.us/XhkNI

Purchases made through some store links may provide some compensation to Linus Media Group.

► GET MERCH: https://lttstore.com
► AFFILIATES, SPONSORS & REFERRALS: https://lmg.gg/sponsors
► PODCAST GEAR: https://lmg.gg/podcastgear
► SUPPORT US ON FLOATPLANE: https://www.floatplane.com/

FOLLOW US ELSEWHERE
---------------------------------------------------
Twitter: https://twitter.com/linustech
Facebook: http://www.facebook.com/LinusTech
Instagram: https://www.instagram.com/linustech
TikTok: https://www.tiktok.com/@linustech
Twitch: https://www.twitch.tv/linustech

MUSIC CREDIT
---------------------------------------------------
Intro: Laszlo - Supernova
Video Link: https://www.youtube.com/watch?v=PKfxmFU3lWY
iTunes Download Link: https://itunes.apple.com/us/album/supernova/id936805712
Artist Link: https://soundcloud.com/laszlomusic

Outro: Approaching Nirvana - Sugar High
Video Link: https://www.youtube.com/watch?v=ngsGBSCDwcI
Listen on Spotify: http://spoti.fi/UxWkUw
Artist Link: http://www.youtube.com/approachingnirvana

Intro animation by MBarek Abdelwassaa https://www.instagram.com/mbarek_abdel/
Monitor And Keyboard by vadimmihalkevich / CC BY 4.0 https://geni.us/PgGWp
Mechanical RGB Keyboard by BigBrotherECE / CC BY 4.0 https://geni.us/mj6pHk4
Mouse Gamer free Model By Oscar Creativo / CC BY 4.0 https://geni.us/Ps3XfE

CHAPTERS
---------------------------------------------------
0:00 Intro

HN Theater Rankings

This course is unranked · view top recommended courses

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.

⬐

Feb 04, 2022 · francis-io on Why Not ZFS (2021)

I know that TrueNAS/FreeNAS add a scrub by default. Possibly ZFS on Linux does now too.
LinusTechTips recently did a video about how they installed ZFS on Linux and didn't have a scrub cron. They started to lose data before noticing.
https://youtu.be/Npu7jkJk5nM

⬐ ap-andersson
I believe I got default cron-files when I installed ZFS on ubuntu 21.04. It could be that they were created and I had to uncomment one line in a file. Then a scrub on the pools would run once a month. Then I setup email on the server and everytime a scrub is done with any errors, I get an email.
Quite easy for me to setup, even though its my first NAS that I built myself and first time using ZFS. Very surprised that LTT effed that up to be honest.

⬐ Annatar
None

⬐ dathinab
It's less Linux and more weather or not you have a database/NAS centric distribution I think.
The problem with by default installing a cron job when installing ZFS is that for a general purpose OS there a good default for when and how often to run it. And running it on the wrong time might even be a major problem.
Through then tbh. having a bad default is probably still better then no default in this case.
> lose data before noticing
Is a bit of an overstatement as they didn't look for quite a while, they also did not only fail to do scrubbing, they also failed to setup automated health checks and reporting.
Turning a non NAS focused Linux distribution into a well working and tuned NAS isn't easy (compared to using a good NAS OS/distribution), but making it somewhat work is easy. Which makes this a pretty common mistake for non-specialized people (i.e. like in their case).

⬐

Feb 04, 2022 · ziml77 on Why Not ZFS (2021)

By default, ZFS does need scrubs to be performed manually. Ask Linus Sebastian about that one https://www.youtube.com/watch?v=Npu7jkJk5nM
TL;DW: Data in their massive 1PB server was suffering from bitrot because there was no scheduled scrub to repair the bad data. And they couldn't tell how bad the situation was because without any scrubs happening, the stats on data integrity were inaccurate.

⬐ cosmin800
I agree, the scrubs are performed manually, but enabled by default in debian/ubuntu, via crontab every two weeks (mdadm consitency check is also triggered from cron)
about the data loss in the video, mistakes easy to spot: 1st: using seagate. 2nd: installed by us and never updated 3rd: insufficient reading of docs before going all in on zfs. 4th: buying more seagate drives ;))
I think they had a way higher chance of losing their data going the usual stack mdadm/lvm/ext4/luks/btrfs, I think mastering those is harder than mastering zfs.

⬐ gjvc
I bet you they also bought all the same make, model, and batch/vintage drives.
If you are building a storage array, do not do this. Ensure that you are using a variety of drive types (obviously same size and interface technology). Doing so guards against the danger of too many drives going wrong at the same time (within the same time window) causing a failure from which it is impossible to recover.

⬐ fredoralive
I think for Linus Media Group the main "meta" issue is that they don't have a dedicated member of staff to handle boring day-to-day IT / sysadmin tasks that you don't make videos about. A video about building a crazy storage server is content for a video so gets done, but somebody needs to make sure its still working / updated, and you don't make videos about routine maintenance that so its forgotten.
Although everyone else can learn the important lesson that RAID / ZFS isn't magic and you need to have stuff setup correctly and monitored. The fact that RAID isn't a backup as well[1].
[1] Although if the LMG servers affected are just for data hoarding raw footage that is unlikely to be needed again, it's possible the risk / cost balance pushes away from backups and just relying on RAID, but that's a niche case (and they lost the gamble...).

⬐ Dylan16807
What also makes it niche is getting the drives for free. When you're paying upwards of $30k for a petabyte of cold storage, tape is pretty tempting.

⬐ ziml77
Yes, their setup likely would have been configured, monitored, and maintained properly if they had an IT guy. But they made the (easy to make) mistake of thinking that having enough tech knowledge means you don't need a proper IT/systems department.
I'm certain at least that Linus knows that RAID isn't backup. And I'm Linus is going to try had to get the data back, but it seems to me that this isn't some devastating failure for him.

Linus Tech Tips [video] - topic is lost data

⬐

Jan 31, 2022 · 2 points, 1 comments · submitted by taubek

⬐ taubek
Linus Tech tips explains what has gone wrong and now they come to lose the data. What has caught my ear is the moment the fact that at somewhere around 2 minute mark into the video he says that they have nevere hired a full time IT person.
This video is actually a nice post mortem what went wrong.

Linus Tech Tips – unmonitored, unscrubbed, too wide zpools die in mess

Jan 30, 2022 · 5 points, 0 comments · submitted by icybox

LTT – Our Data is Gone again

⬐

Jan 29, 2022 · 8 points, 6 comments · submitted by InTheArena

⬐ ecf
Seems like they intentionally caused this for additional YouTube content?

⬐ michidk
That's why you use managed solutions.

⬐ Avamander
TL;DW?

⬐ detaro
Put 780 TB of data in one big cobbled-together server, didn't have backups, too many of the disks died/got corrupted before they noticed because they never ran integrity checks on the file system.

⬐ InTheArena
Two servers actually. they have damage to both servers, as I read it, due to never scrubbing the disks, and having physical hardware failures and poor power quality.

⬐ InTheArena
I always like videos that remind people to have reasonable backup and maintenance schedules.

Hacker News Comments on Our data is GONE... Again

Hacker News Stories and Comments

Hacker News Comments on
Our data is GONE... Again