There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

How to save data for archive purposes?

Hi,

I’m not sure if this is the right community for my question, but as my daily driver is Linux, it feels somewhat relevant.

I have a lot of data on my backup drives, and recently added 50GB to my already 300GB of storage (I can already hear the comments about how low/high/boring that is). It’s mostly family pictures, videos, and documents since 2004, much of which has already been compressed using self-made bash scripts (so it’s Linux-related ^^).

I have a lot of data that I don’t need regular access to and won’t be changing anymore. I’m looking for a way to archive it securely, separate from my backup but still safe.

My initial thought was to burn it onto DVDs, but that’s quite outdated and DVDs don’t hold much data. Blu-ray discs can store more, but I’m unsure about their longevity. Is there a better option? I’m looking for something immutable, safe, easy to use, and that will stand the test of time.

I read about data crystals, but they seem to be still in the research phase and not available for consumers. What about using old hard drives? Don’t they need to be powered on every few months/years to maintain the magnetic charges?

What do you think? How do you archive data that won’t change and doesn’t need to be very accessible?

Cheers

Barx ,

As a start, follow the 3-2-1 rule:

  • At least 3 copies of the data.
  • On at least 2 different devices / media.
  • At least 1 offsite backup.

I would add one more thing: invest in a process for verifying that your backups are working. Like a test system that is occasionally restored to from backups.

Let’s say what you care about most is photos. You will want to store them locally on a computer somewhere (one copy) and offsite somewhere (second copy). So all you need to do is figure out one more local or offsite location for your third copy. Offsite is probably best but is more expensive. I would encrypt the data and then store on the cloud for my main offsite backup. This way your data is private so it doesn’t matter that it is stored in someone else’s server.

I am personally a fan of Borg backup because you can do incremental backups with a retention policy (like Macs’ Time Machine), the archive is deduped, and the archive can be encrypted.

Consider this option:

  1. Your data raw on a server/computer in your home.
  2. An encrypted, deduped archive on that sane computer.
  3. That archive regularly copied to a second device (ideally another medium) and synchronized to a cloud file storage system.
  4. A backup restoration test process that takes the backups and shows that they restores important files, the right number, size, etc.

If disaster strikes and all your local copies are toast, this strategy ensures you don’t lose important data. Regular restore testing ensures the remote copy is valid. If you have two cloyd copies, you are protected against one of the providers screwing up and removing data without you knowing and fixing it.

NaibofTabr ,

Someone else has mentioned M-Disc and I want to second that. The benefit of using a storage format like this is that the actual storage media is designed to last a long time, and it is separate from the drive mechanism. This is a very important feature - the data is safe from mechanical, electrical and electronic failure because the storage is independent of the drive. If your drive dies, you can replace it with no risk to the data. Every serious form of archival data storage is the same - the storage media is separate from the reading device.

An M-Disc drive is required to write data, but any DVD or BD drive can read the data. It should be possible to acquire a replacement DVD drive to recover the data from secondary markets (eBay) for a very long time if necessary, even after they’re no longer manufactured.

Max_P ,
@Max_P@lemmy.max-p.me avatar

I would use maybe a Raspberry Pi or old laptop with two drives (preferably different brands/age, HDD or SSD doesn’t really matter) in it using a checksumming filesystem like btrfs or ZFS so that you can do regular scrubs to verify data integrity.

Then, from that device, pull the data from your main system as needed (that way, the main system has no way of breaking into the backup device so won’t be affected by ransomware), and once it’s done, shut it off or even unplug it completely and store it securely, preferably in a metal box to avoid any magnetic fields from interfering with the drives. Plug it in and boot it up every now and then to perform a scrub to validate that the data is all still intact and repair the data as necessary and resilver a drive if one of them fails.

The unfortunate reality is most storage mediums will eventually fade out, so the best way to deal with that is an active system that can check data integrity and correct the files, and rewrite all the data once in a while to make sure the data is fresh and strong.

If you’re really serious about that data, I would opt for both an HDD and an SSD, and have two of those systems at different locations. That way, if something shakes up the HDD and damages the platter, the SSD is probably fine, and if it’s forgotten for a while maybe the SSD’s memory cells will have faded but not the HDD. The strength is in the diversity of the mediums. Maybe burn a Blu-Ray as well just in case, it’ll fade too but hopefully differently than an SSD or an HDD. The more copies, even partial copies, the more likely you can recover the entirety of the data, and you have the checksums to validate which blocks from which medium is correct. (Fun fact, people have been archiving LaserDiscs and repairing them by ripping the same movie from multiple identical discs, as they’re unlikely to fade at exactly the same spots at the same time, so you can merge them all together and cross-reference them and usually get a near perfect rip of it).

DasFaultier ,

This is my day job, so I’d like to weigh in.

First of all, there’s a whole community of GLAM institutions involved in what is called Digital Preservation (try googling that specifically). Here in Germany, a lot of them have founded the Nestor Group (www.langzeitarchivierung.de) to further the case and share knowledge. Recently, Nestor had a discussion group on Personal Digital Archiving, addressing just your use case. They have set up a website at meindigitalesarchiv.de with the results. Nestor publishes mostly in German, but online translators are a thing, so I think you will be fine.

Some things that I want to address from your original post:

  • Keep in mind that file formats, just like hardware and software, become obsolete over time. Think about a migration strategy for your files to a more recent format of your current format falls out of style and isn’t as widely supported anymore. I assume your photos are JPGs, which are widely not considered safe for preservation, as they decay with subsequent encoding runs and use lossy compression. A suitable replacement might be PNG, though I wouldn’t go ahead and convert my JPGs right away. For born digital photo material, uncompressed TIFF is the preferred format.
  • Compression in general is considered a risk, because a damaged bit will potentially impact a larger block of compressed data. Saving a few bytes on your storage isn’t worth listing your precious memories.
  • Storage media have different retention times. It’s true that magnetic tape storage has the best chances for survival, and it’s what we use for long term cold storage, but it’s prohibitively expensive for home use. Also, it’s VERY slow on random access, because tape has to be rewound to the specific location of your file before reading. If you insist on using it, format your tapes using LTFS to eliminate the need for a storage management system like IBM Spectrum Protect. The next best choice of storage media are NAS grade HDDs, which will last you upwards of five years. Using redundancy and a self correcting file system like ZFS (compression & dedup OFF!) will increase your chances of survival. Keep you hands off optical storage media; they tend to decay after a year already according top studies on the subject. Flash storage isn’t much greater either, avoid thumb drives at all cost. Quality SSD storage might last you a little longer. If you use ZFS or a comparable file system that provides snapshots, you can use that to implement immutability.
  • Kudos for using Linux standard tooling; it will help other people understand your stack of anything happens to you. Digital Preservation is all about removing dependencies on specific formats, technologies and (importantly) people.
  • Backup is not Digital Preservation, though I will admit that these two tend get mixed into one another in personal contexts. Backups save the state of a system at a specific point in time, DigiPres tries to preserve only data that isn’t specific to a system and tends to change very little. Also, and that is important, DigiPres tries to save context along with the actual payload, so you might want to at least save some metadata along with your photos and store them all in a structure that is made for preservation. I recommend BagIt; there’s a lot of existing tooling for creating it, it’s self-contained, secured by strong checksums and it’s an RFC.
  • Keep complexity as low as possible!
  • Last of all, good on you for doing SOMETHING. You don’t have to be perfect to improve your posture, and you’re on the right track, asking the right questions. Keep on going, you’re doing great.

Come back at me if you have any further questions.

Extrasvhx9he , (edited )

Might be a dumb idea but hear me out. How about sealing a reputable enterprise or consumer SSD in one of those anti static bags with a desiccant and then sealing that inside a pvc pipe also with desiccant and then burying it below the frost line? You’ll just have to dig it up and refresh everything every couple of years, think 3 years at most iirc for consumer ones. Obviously this isn’t a replacement for a backup solution just archival so no interaction with it. It’ll protect it from the elements, house fires, flooding, temperature fluctuations pretty much everything and its cost effective. Hell you can even surround the hard drive bag in foam then stuff in the pvc pipe for added shock absorption. Make a map afterwards like a damn pirate (its night time so my bad if I sound deranged)

edit I took a nap: in hindsight I should’ve clarified. I went with an ssd in this idea since its more durable than a mechanical, better price for storage capacity, and most likely to be compatible with other computers in the future in case you need it for whatever reason. Of course you can use another storage media, like m disc, just know of the drawbacks. Like needing a m-disc burner (~100$), several discs depending on how big of a capacity you need (price varies), pray that there’s still a reader that can read m-disc in the future and know that’s its gonna be slow when getting your data back regardless. All you would have to do to modify the idea would be getting a disc case that kinda suspends the disc so nothing is touching it’s surfaces. Then the same idea: antistatic bag with desiccant, foam or even bubble wrap around it, stuffed in a pipe with desiccant buried below your frost line. People usually skip the “in optimal conditions” part when talking about m-disc but this way we get close to those optimal conditions

ReversalHatchery ,

went with an ssd in this idea since its more durable than a mechanical, better price for storage capacity

how? sorry but that does not add up to me. for the price of a 2 TB SSD you could by a much larger HDD

and most likely to be compatible with other computers in the future in case you need it for whatever reason.

both of these use SATA plugs, it should be the same

DeuxChevaux ,
@DeuxChevaux@lemmy.world avatar

I use external hard drives. Two of them, and they get rsynced every time something changes, so there’s a copy if one drive should fail. Once a month, I encrypt the whole shebang with gpg and send it off into an AWS bucket.

dgriffith ,

Blu-Ray USB drive and M-Discs is about the best you can get at present. Keep the drive unplugged when not in use, it’ll probably last 10-20 years in storage.

Seeing as there hasn’t been much advance past Blu-ray, keep an eye out for something useful to replace it in the future, or at least get another drive when you notice them becoming scarce.

astrsk ,
@astrsk@fedia.io avatar

According to this Blu-ray has some of the worst expected shelf life, with the exception of BD-RE.

Extrasvhx9he , (edited )

Think they meant a blu-ray drive that could burn to a m-disc.

Zachariah ,
@Zachariah@lemmy.world avatar

Wherever you choose to store it, you should still consider following the 3-2-1 backup rule.

phanto ,

This is actually a real problem… A lot of digital documents from the 90’s and early 2000’s are lost forever. Hard drives die over time, and nobody out there has come up with a good way to permanently archive all that stuff.

I am a crazy person, so I have RAID, Ceph, and JBOD in various and sundry forms. Still, drives die.

Peffse ,

It’s crazy that there isn’t a company out there making viable cold storage for the average consumer. I feel like we’re getting even further away from viability now that we use QLC by default in SSDs. The rot will be so fast.

Goun ,

What about magnetic tape? Isn’t it like super durable?

Sl00k ,

nobody out there has come up with a good way to permanently archive all that stuff

Personally I can’t wait for these glass hard drives being researched to come at the consumer or even corporate level. Yes they’re only writable one time and read only after that, but I absolutely love the concept of being able to write my entire Plex server to a glass harddrive, plug it in and never have to sorry about it again.

sin_free_for_00_days ,
Sl00k ,

This is interesting, haven’t heard of it. I think the problem with the disc format is you aren’t getting 28 TB of content on there unless you span multiple discs which is a pain in the ass

xmanmonk ,

Don’t use DVDs. They suffer bitrot, as do “metal” hard drives.

NegativeLookBehind ,
@NegativeLookBehind@lemmy.world avatar

NAS

  • All
  • Subscribed
  • Moderated
  • Favorites
  • [email protected]
  • random
  • lifeLocal
  • goranko
  • All magazines