32TB hard drives are incoming according to Toshiba

alessandro@lemmy.ca · 7 months ago

32TB hard drives are incoming according to Toshiba

Dr. Wesker@lemmy.sdf.org · edit-2 7 months ago

Finally, a hard drive with the capacity to install more than 2 AAA game titles at once.

LoamImprovement@beehaw.org · 7 months ago

But can it handle AAAA games?

WanderingCat@lemm.ee · 7 months ago

Yes cause they’re smaller

Techognito@lemmy.world · 7 months ago

This is obviously just in preparation for future AAA game sizes

Ulrich_the_Old@lemmy.ca · 7 months ago

When I bought my first PC about 1982. The seller told me that I would never live long enough to fill up the 10MB drive. I still bought the 40MB drive and it was still too small.

Couldbealeotard@lemmy.world · edit-2 7 months ago

I remember getting a 2 GB hard drive and thinking I’ll never be able to fill it up. Now I have video files more then 10 times that size

emptiestplace@lemmy.ml · 7 months ago

I think you might be off by a few years at least, a 40MB drive in 1982 would’ve been incredibly uncommon.

Dasus@lemmy.world · 7 months ago

Idk man.

In the 1980s 8-inch drives used with some mid-range systems increased from a low of about 30 MB in 1980, to a top-of-the-line 3 GB in 1989.

https://en.m.wikipedia.org/wiki/History_of_hard_disk_drives

Seems like 30MB wasn’t horribly uncommon in “mid-range systems” in 1980, so I doubt that 40MB in 1982 would’ve been “incredibly uncommon.”

But I’ve no personal experience from the time.

emptiestplace@lemmy.ml · 7 months ago

“Mid-range systems” is not referring to personal computers. “8-inch drives” is another clue.

Dasus@lemmy.world · 7 months ago

True, he did say PC, fair enough.

Ulrich_the_Old@lemmy.ca · 7 months ago

It is possible that I might be slightly mistaken over something unimportant that happened 40 years ago. Yes it is possible.

Delusional@lemmy.world · 7 months ago

Awesome can’t wait til they’re cheap so I can replace my many hard drives with just one much larger one.

fuckwit_mcbumcrumble@lemmy.dbzer0.com · 7 months ago

Make sure you buy two of them so you’ve got a backup. I’m uncomfortable storing 16TB worth of data on one drive, no way am I putting 32TB of anything I give a shit about onto one drive.

meteokr@community.adiquaints.moe · 7 months ago

If you have 20TB of data to store, a single drive is safer than splitting it across multiple drives. Few point of failure in total.

prettybunnys@sh.itjust.works · edit-2 7 months ago

If you are storing your own data a single drive is asking to lose all your data.

3 2 1 for all your important data.

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social · 7 months ago

RAID6, my person. RAID6.

Sorse@discuss.tchncs.de · 7 months ago

RAID is not a backup.

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social · 7 months ago

It is not. But backups are also not RAID.

lud@lemm.ee · edit-2 7 months ago

Yes, obviously.

You need backups. RAID or something similar is only necessary if you need redundancy which is most often not as necessary compared to loosing all your data.

jws_shadotak@sh.itjust.works · 7 months ago

RAID6 only works if the machine is working fine. If something happens that toasts the whole thing then you’re fucked unless you have a backup offsite.

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social · 7 months ago

Backups are important, but we were talking about drive failures. Backups help when you screw up the data; RAID6 helps when drives go bad. If you don’t trust the hardware, RAID.

Backups only means you’re down until you restore; RAID5/6 means you stay up.

jws_shadotak@sh.itjust.works · 7 months ago

Right, but he was talking about the 3 2 1 rule and you recommended RAID6.

three@lemm.ee · edit-2 7 months ago

RAID is not a backup.

meteokr@community.adiquaints.moe · 7 months ago

Reducing the number of drives you are running, reduces the risk of losing data. Do you disagree?

shalafi@lemmy.world · 7 months ago

Depends entirely on the config. RAID 0? Higher risk. RAID 1? Lower risk.

I run RAID 0 on a couple of external USB drives with a full backup on Google and locally. No worries.

meteokr@community.adiquaints.moe · 7 months ago

The amount of risk of drives failing is not dependent of your raid config at all. ignoring excessive duty cycling. I believe you are misunderstanding the point I was making in my original reply. I’m claiming that these 32TB drives will reduce your risk of losing data than by raiding 2 16TB drives, given the same failure rate.

I’m uncomfortable storing 16TB worth of data on one drive

Example you have 20TB of data. What is safer?

2 16TB drives in raid0
1 32TB drive

This is completely irrelevant to your backup solution. You should have backups, of course, but I don’t see how that factors into my point? You have to put the data somewhere, and then back it up, where do you put it? I will always put it on as few physical drives as possible, to minimize the risk of drive failure over time so I don’t have to restore/re-stripe as often.

RecluseRamble@lemmy.dbzer0.com · 7 months ago

I’m claiming that these 32TB drives will reduce your risk of losing data than by raiding 2 16TB drives, given the same failure rate.

Assuming the probability of failure is the same, you’re right, running two drives doubles the risk of a drive failing.

However, if your single 32 TB drive fails, all data is gone and you have to rely on backup. If one of the 16 TB drives fails, you replace it and the RAID restores the data with much less hassle.

Both 16 TB drives failing at once is negligible (however, the RAID controller might).

RecluseRamble@lemmy.dbzer0.com · 7 months ago

It seems you never had a HDD die on you.

meteokr@community.adiquaints.moe · 7 months ago

You misunderstand my claim.

prettybunnys@sh.itjust.works · 7 months ago

You misunderstand the intent then.

Why would anyone back up data in the manner you’re saying? That’s dumb.

Don’t split the data across multiple logical locations, keep it logically contained. A raid designed for availability is better than a single external hard drive but that isn’t what is being talked about.

3 2 1 means keeping multiple copies of the SAME data on multiple media types in multiple locations so you remove a single point of failure.

emptiestplace@lemmy.ml · 7 months ago

You are not ready to be lecturing on this topic.

XEAL@lemm.ee · 7 months ago

This single point of failure equals to putting all of your eggs in the same basket.

meteokr@community.adiquaints.moe · 7 months ago

Which is why you have backups. Doesn’t matter if you have 1 32TB drive or 32 1TB drives, backups are how you recover from failure. Running 1 drive is less risk than running 2 drives for the same storage capacity.

fuckwit_mcbumcrumble@lemmy.dbzer0.com · 7 months ago

If it’s split up sure, but I’m talking about a raid > 0 setup and/or having backup copies of your data onto drive #2

meteokr@community.adiquaints.moe · 7 months ago

Raid0? You mean having two devices stripped across is rather than just one device with no stripping? Raid0 is a risk you take when you care more about performance than downtime to restore a backup.

If I have 20TB of data, it cannot fit on a single 16TB drive. So my options are Raid, or this single drive option. I would always pick the single drive if I could afford it.

fuckwit_mcbumcrumble@lemmy.dbzer0.com · 7 months ago

Double check that symbol there.

Raid 5 is a great balance of capacity and useful storage with 3 drives. You get 1 drive worth of fault tolerance and 2 drives worth of capacity. I personally have mismatched drives so I run raid 1 in between the matching sizes, and jbod between the raid 1 mirrors (well the zfs equivilent) And my really important data is backed up onto two more drives in raid 10.

meteokr@community.adiquaints.moe · 7 months ago

The person I replied to said

I’m uncomfortable storing 16TB worth of data on one drive

as a criticism of using a single 32TB drive.

I argue that a single 32TB drive is less risk than using 2 16TB drives. Am I wrong?

prettybunnys@sh.itjust.works · edit-2 7 months ago

Christ alive.

No. Actually. The 32TB drive is a single point of failure for all your data.

Splitting it means you have 2 points of failure but for only half your data.

From an integrity and availability standpoint the two disk solution, while wildly ridiculous and dumb as fuck, is actually better.

Both solutions are ridiculous and dumb and are not sufficient backup.

AnotherDirtyAnglo@lemmy.ca · 7 months ago

First, if you have more than one disk, you should be either getting redundancy through mirroring, or building arrays of several disks with redundant methods like RAID5 / RAID6 / ZFS zraid2.

Second, no single copy of data is safe, you must always have recent, tested backups.

Nomecks@lemmy.ca · 7 months ago

You can buy large second hand enterprise hard disks for relatively cheap. 20TB disks are like 250 bucks.

hperrin@lemmy.world · 7 months ago

Still not enough to hold all my porn.

FMEEE@lemmy.dbzer0.com · 7 months ago

Holy shit how addicted are you?

Shadowedcross@lemmy.world · 7 months ago

Least addicted porn downloader.

Echo Dot@feddit.uk · 7 months ago

Perhaps there are content provider. Shooting in RAW stakes of a lot of space.

ckai@lemmy.world · 7 months ago

deleted by creator

Echo Dot@feddit.uk · 7 months ago

No god, so how big is the new CoD going to be?

AwkwardLookMonkeyPuppet@lemmy.world · 7 months ago

One petabyte.

Car@lemmy.dbzer0.com · 7 months ago

I look forward to a Backblaze analysis in a few years.

henfredemars@infosec.pub · 7 months ago

Wonder what the bit error rates are like at that density in practice.

AnUnusualRelic@lemmy.world · 7 months ago

They’re actually 128TB drives, but everything has to be written four times.

esc27@lemmy.world · 7 months ago

Will raid 6 still be viable at this size or will this require something like raid 10 or even moving beyond raid.

ky56@aussie.zone · 7 months ago

My solution is RAIDZ5 and storing the backup on LTO6 tape with parity/erasure code. I think the fact that scrub times take 24 hours even on 16TB drives is already over the safety margin. If a drive failure happens, the first thing I’ll do to run a manual diff backup which should take a fraction of the time and then run the ZFS resilver.

I’m beginning to see why SSD RAID is being considered now. My guess for HDDs in enterprise is that a RAID 15 (I made this up) would be considered. What I mean is data is stored on two identical servers each running RAID5 or 6. Off the shelf solutions like Gluster exist and that seems to be gaining traction at least according to Linus Tech Tips.

emptiestplace@lemmy.ml · 7 months ago

SSD RAID is actually very common outside of home use! And yeah, clustered filesystems help overcome many of these limitations, but tend to be extremely demanding (expensive hardware for comparable performance). Network almost immediately becomes the bottleneck. Even forgetting about latency and other network efficiency concerns, 100 Gbps isn’t that fast when you have individual devices approaching 16 Gbps.

Resol van Lemmy@lemmy.world · 7 months ago

That’s enough for the entire filmography of William Hanna and Joseph Barbera in beautiful 1080p (upscaled using world class software), and it would probably still be enough for some of the early shows of Cartoon Network, at least in 480p.

But then it would take ages to load anyway since it’s a hard drive and therefore has moving parts, leading to a significantly higher failure rate.

Mike@lemmy.ml · 7 months ago

Ages is an understatement. This drive uses two new technologies that essentially expand the track momentarily plus smr

Resol van Lemmy@lemmy.world · 7 months ago

Eons perhaps?

DebatableRaccoon@lemmy.ca · 7 months ago

Found the anti-HDD drama queen

Resol van Lemmy@lemmy.world · 7 months ago

I’m not a girl? “Anti-HDD drama king” makes more sense.

DebatableRaccoon@lemmy.ca · edit-2 7 months ago

Not everything has to be so literal… as evidenced by your claims of “ages”.

Resol van Lemmy@lemmy.world · 7 months ago

Ok…

SplashJackson@lemmy.ca · 7 months ago

Slaps roof, you can fit so much Christina Model porn in here

meteokr@community.adiquaints.moe · edit-2 7 months ago

TL:DR; Bigger drives reduces the risk of data loss overtime. Please backup your data. RAID is not a backup.

As drives get bigger and bigger, the emotionally risk you feel when you fill them up is real. However, that is not the best way to think about it. Drives will inevitably fail, and drives are easily replaced commodities, their failure should be expected, and handled appropriately. RAID is not a backup, and does not reduce your risk of drive failure. RAID creates a safer environment for your data when a drive fails. How you should think about RAID is as if you are replacing a failed drive in advance, not as a reduction of risk of the drive failing.

To illustrate my point, we have Y of data to store. I can either split the data across X number drives, or store it all on a single drive. Which is safer? A single drive is objectively safer, given the same failure rate. So we have two cases for this situation. In both cases, this imaginary drive fails 10% of the time. The exact amount doesn’t matter so long as they are reasonably close.

Case A: You have 1 drive holding all your data. There is a 1/10 chance it fails. Your risk is 10%.

Case B: You have X drives holding all your data. Each drive has a 1/10 chance of failing. so a 1−(9/10)^X chance any of the drives fail. For all of X, your rate of failure is higher than 1/10. For two drives you have 19% chance of failure, three drives is 27%.

In all cases your rate of failure increases the more drives you add to hold your data. Please do not become confused by what RAID does for this illustration. RAID will not prevent drive failures. RAID allows you to, in essence, “pre-fail” a drive in advance. A drive will fail, and some RAID configurations(1,5,6) will replace the functionality of the failed drive until you can replace the “real” failed drive. RAID did not prevent your drive failure, it only moved the time the failure happened to be convenient for the user. A RAID1 array with a failed drive is still a failed drive that needs to be replaced, and still needs to be restored from backup/re-striped.

Let’s take the cases of no RAID vs RAID1.

Case A: You have 1 drive holding all your data. When the drives fails, you stop your work, and replace the drive immediately.

Case RAID1: You have 1 drive holding all your data. You continue working because you’ve been very busy. You replace the drive when you have some downtime a week later.

In Case A, you had lost productivity because the drive failed at an inconvenient time, in the RAID1 case you could schedule the drive replacement for a later date when you had some spare time, huge improvement in the user experience. But wait! I said in the case of RAID1 only one of the drives was holding my data, should I have said 2 drives were? Yes, in a literal sense the RAID1 holds a copy of the data in the second drive. However, RAID is not a backup, it is a system to schedule the time of drive failures. Your backup of the RAID array is what holds a real second copy of your data, not your mirrored drive, because RAID is not a backup. Your second drive was still present in Case A, it was just replaced after the failure occurred, rather than before the first one failed.

Be safe with your data. please make backups, and verify you can restore from them regularly. RAID is not a backup.

emptiestplace@lemmy.ml · 7 months ago

Bits of what you wrote are reasonable, but your premise is incorrect.

Consider a scenario with a degraded RAID 1 array comprised of two 1.6 TB disks capable of transferring data at a sustained rate of 6 Gbps: you should be able to recover from a single disk failure in just over half an hour.

Repeat the same scenario with 32 TB members, now we’re looking at a twelve hour recovery - twelve hours of intensive activity that could push either of your drives over the edge. Increasing data density actually increases the risk of data loss.

Finally, we say you shouldn’t think of RAID as a backup because the entire array could fail, not for the excruciatingly literal reasons you are attempting to convey. If you lose half of a two disk mirror set, you haven’t lost any data.

meteokr@community.adiquaints.moe · edit-2 7 months ago

Consider a scenario with a degraded RAID 1 array comprised of two 1.6 TB disks capable of transferring data at a sustained rate of 6 Gbps: you should be able to recover from a single disk failure in just over half an hour.

Repeat the same scenario with 32 TB members, now we’re looking at a twelve hour recovery - twelve hours of intensive activity that could push either of your drives over the edge. Increasing data density actually increases the risk of data loss.

The speed and method you use recover from data loss is not relevant to the discussion of how to handle drive failure. That varies wildly depending on your specific setup.

Finally, we say you shouldn’t think of RAID as a backup because the entire array could fail, not for the excruciatingly literal reasons you are attempting to convey. If you lose half of a two disk mirror set, you haven’t lost any data.

My premise is that reducing the number of drives reduces the risk of drive failure which could lead to data loss. RAID is not a backup, because it literally isn’t. If you have two drives in RAID1 you have 1 set of your data. If you have 4 drives in RAID6 you have 1 set of your data. In both examples you have a single very durable drive, but you do not have a backup. A backup prevents data loss, RAID does not.

Think of it this way. You have a single very large drive, and you explicitly only use 1/2 of it. The other 1/2 of the drive becomes broken and you cannot read or write to it. The first 1/2 work perfectly fine, and fits all your data. Would you consider this drive functional, or failed? A RAID degradation is a warning to the user that a portion of the single drive is broken, and needs to be repaired. A RAID block device should always be treated as a single physical drive, with varying levels of durability and warning signs depending upon its configuration. It can’t be a backup, because all its doing is delaying the eventual failure. Delaying a failure does not prevent the failure from happening, and does not help you when a failure occurs.

emptiestplace@lemmy.ml · 7 months ago

You can’t have a three drive RAID 6 array.

Please just stop.

meteokr@community.adiquaints.moe · 7 months ago

Oh thanks for the tip! I’ve edited my comment to reflect the minimum of 4 drives for a RAID6 array.

I’ve not used RAID6 for a small array like that before so I didn’t know it had a conventional lower limit. From the technical sense it doesn’t have to have 4 drives, it just wouldn’t make any sense to use it that way so I see why software wouldn’t support such a use case.

emptiestplace@lemmy.ml · 7 months ago

From the technical sense it doesn’t have to have 4 drives

Please explain how you think you can distribute two sets of parity data across a three drive array?

meteokr@community.adiquaints.moe · edit-2 7 months ago

Drive 1: A, Drive 2: 1/2 A, Drive 3: 2/2 A. Drive 2 + Drive 3 = Drive 1. Hmm that would only be one set of the party though. So you could also add 1/2 of A to Drive 1, and 2/2 to Drive 2 so that the parity on Drive 1 + Drive 2 = Drive 3. Which is extremely silly, and doesn’t make a lot of sense to use in the real world.

BCsven@lemmy.ca · 7 months ago

Your RAID may fail a disk, but you still have your data on another disk(s). It is not a backup since data is replicated and deletion , means deletion…but raid gives you breathing room to recover from disaster

meteokr@community.adiquaints.moe · 7 months ago

Exactly! RAID gives you the breathing room to react to the partial failure of the full RAID array disk. I appreciate your understanding.

Toribor@corndog.social · edit-2 7 months ago

You’re assuming that the failure rate for drives are all the same though. Aren’t the failure rates for new high capacity drives typically higher?

meteokr@community.adiquaints.moe · 7 months ago

Yes their failure rates are usually a bit higher, but usually less than the increase in rate from using more than one disk instead. A bit of math can be done using Backblaze’s disk failure rate data to get a reasonable approximation of the overall risk of failure.

falkerie71@sh.itjust.works · 7 months ago

How should I go about verifying or rehearsing data restoration when my main computer is fine and don’t have a spare to test with?

Nomecks@lemmy.ca · 7 months ago

Restore file -> md5sum against original -> delete -> repeat with another file.

Script it up!

meteokr@community.adiquaints.moe · 7 months ago

A simple way of doing it, is to just move some of the data somewhere else, and then restore that backup. If the contents are fine, then all is well, and if they aren’t, then you can delete the broken restore, and move the files back where they were. Depending on how you are doing backups, some system have built in “dry-run” style tests were they can test themselves, but you should still verify the contents every so often.

Echo Dot@feddit.uk · 7 months ago

RAID is a backup, obviously It doesn’t work if you store the backup on the computer that has the primary on it as well. Regardless of what solution you choose.

lud@lemm.ee · 7 months ago

No, RAID can be used for backup like having two or three different RAID arrays at different locations. But RAID itself isn’t a backup. It’s as the name implies redundancy instead of backup.