Hacker News Comments on
Our data is GONE... Again
Linus Tech Tips
·
Youtube
·
15
HN points
·
2
HN comments
- This course is unranked · view top recommended courses
Hacker News Stories and Comments
All the comments and stories posted to Hacker News that reference this video.I know that TrueNAS/FreeNAS add a scrub by default. Possibly ZFS on Linux does now too.LinusTechTips recently did a video about how they installed ZFS on Linux and didn't have a scrub cron. They started to lose data before noticing.
⬐ ap-anderssonI believe I got default cron-files when I installed ZFS on ubuntu 21.04. It could be that they were created and I had to uncomment one line in a file. Then a scrub on the pools would run once a month. Then I setup email on the server and everytime a scrub is done with any errors, I get an email.Quite easy for me to setup, even though its my first NAS that I built myself and first time using ZFS. Very surprised that LTT effed that up to be honest.
⬐ AnnatarNone⬐ dathinabIt's less Linux and more weather or not you have a database/NAS centric distribution I think.The problem with by default installing a cron job when installing ZFS is that for a general purpose OS there a good default for when and how often to run it. And running it on the wrong time might even be a major problem.
Through then tbh. having a bad default is probably still better then no default in this case.
> lose data before noticing
Is a bit of an overstatement as they didn't look for quite a while, they also did not only fail to do scrubbing, they also failed to setup automated health checks and reporting.
Turning a non NAS focused Linux distribution into a well working and tuned NAS isn't easy (compared to using a good NAS OS/distribution), but making it somewhat work is easy. Which makes this a pretty common mistake for non-specialized people (i.e. like in their case).
By default, ZFS does need scrubs to be performed manually. Ask Linus Sebastian about that one https://www.youtube.com/watch?v=Npu7jkJk5nMTL;DW: Data in their massive 1PB server was suffering from bitrot because there was no scheduled scrub to repair the bad data. And they couldn't tell how bad the situation was because without any scrubs happening, the stats on data integrity were inaccurate.
⬐ cosmin800I agree, the scrubs are performed manually, but enabled by default in debian/ubuntu, via crontab every two weeks (mdadm consitency check is also triggered from cron)about the data loss in the video, mistakes easy to spot: 1st: using seagate. 2nd: installed by us and never updated 3rd: insufficient reading of docs before going all in on zfs. 4th: buying more seagate drives ;))
I think they had a way higher chance of losing their data going the usual stack mdadm/lvm/ext4/luks/btrfs, I think mastering those is harder than mastering zfs.
⬐ gjvcI bet you they also bought all the same make, model, and batch/vintage drives.If you are building a storage array, do not do this. Ensure that you are using a variety of drive types (obviously same size and interface technology). Doing so guards against the danger of too many drives going wrong at the same time (within the same time window) causing a failure from which it is impossible to recover.
⬐ fredoraliveI think for Linus Media Group the main "meta" issue is that they don't have a dedicated member of staff to handle boring day-to-day IT / sysadmin tasks that you don't make videos about. A video about building a crazy storage server is content for a video so gets done, but somebody needs to make sure its still working / updated, and you don't make videos about routine maintenance that so its forgotten.Although everyone else can learn the important lesson that RAID / ZFS isn't magic and you need to have stuff setup correctly and monitored. The fact that RAID isn't a backup as well[1].
[1] Although if the LMG servers affected are just for data hoarding raw footage that is unlikely to be needed again, it's possible the risk / cost balance pushes away from backups and just relying on RAID, but that's a niche case (and they lost the gamble...).
⬐ Dylan16807What also makes it niche is getting the drives for free. When you're paying upwards of $30k for a petabyte of cold storage, tape is pretty tempting.⬐ ziml77Yes, their setup likely would have been configured, monitored, and maintained properly if they had an IT guy. But they made the (easy to make) mistake of thinking that having enough tech knowledge means you don't need a proper IT/systems department.I'm certain at least that Linus knows that RAID isn't backup. And I'm Linus is going to try had to get the data back, but it seems to me that this isn't some devastating failure for him.
⬐ taubekLinus Tech tips explains what has gone wrong and now they come to lose the data. What has caught my ear is the moment the fact that at somewhere around 2 minute mark into the video he says that they have nevere hired a full time IT person.This video is actually a nice post mortem what went wrong.
⬐ ecfSeems like they intentionally caused this for additional YouTube content?⬐ michidkThat's why you use managed solutions.⬐ AvamanderTL;DW?⬐ detaro⬐ InTheArenaPut 780 TB of data in one big cobbled-together server, didn't have backups, too many of the disks died/got corrupted before they noticed because they never ran integrity checks on the file system.⬐ InTheArenaTwo servers actually. they have damage to both servers, as I read it, due to never scrubbing the disks, and having physical hardware failures and poor power quality.I always like videos that remind people to have reasonable backup and maintenance schedules.