Stop services during backup when using snapshots?

It’s fairly obvious why stopping a service while backing it up makes sense. Imagine backing up Immich while it’s running. You start the backup, db is backed up, now image assets are being copied. That could take an hour. While the assets are being backed up, a new image is uploaded. The live database knows about it but the one you’ve backed up doesn’t. Then your backup process reaches the new image asset and it copies it. If you restore this backup, Immich will contain an asset that isn’t known by the database. In order to avoid scenarios like this, you’d stop Immich while the backup is running.

Now consider a system that can do instant snapshots like ZFS or LVM. Immich is running, you stop it, take a snapshot, then restart it. Then you backup Immich from the snapshot while Immich is running. This should reduce the downtime needed to the time it takes to do the snapshot. The state of Immich data in the snapshot should be equivalent to backing up a stopped Immich instance.

Now consider a case like above without stopping Immich while taking the snapshot. In theory the data you’re backing up should represent the complete state of Immich at a point in time eliminating the possibility of divergent data between databases and assets. It would however represent the state of a live Immich instance. E.g. lock files, etc. Wouldn’t restoring from such a backup be equivalent to kill -9 or pulling the cable and restarting the service? If a service can recover from a cable pull, is it reasonable to consider it should recover from restoring from a snapshot taken while live? If so, is there much point to stopping services during snapshots?

MangoPenguin , 3 hours ago

You start the backup, db is backed up, now image assets are being copied. That could take an hour.

For the initial backup maybe, but subsequent incrementals should only take a minute or two.

I don’t bother stopping services, it’s too time intensive to deal with setting that up.

I’ve yet to meet any service that can’t recover smoothly from a kill -9 equivalent, any that did sure wouldn’t be in my list of stuff I run anymore.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

avidamoeba OP , 3 hours ago

It depends on the dataset. If the dataset itself is very large, just walking it to figure out what the incremental part is can take a while on spinning disks. Concrete example - Immich instance with 600GB of data, hundreds of thousands of files, sitting on a 5-disk RAIDz2 of 7200RPM disks. Just walking the directory structure and getting the ctimes takes over an hour. Suboptimal hardware, suboptimal workload. The only way I could think of speeding it up is using ZFS itself to do the backups with send/recv, thus avoiding the file operations altogether. But if I do that, I must use ZFS on the backup machine too.

I’ve yet to meet any service that can’t recover smoothly from a kill -9 equivalent, any that did sure wouldn’t be in my list of stuff I run anymore.

My thoughts precisely.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Decronym Bot , 4 hours ago (edited 4 minutes ago)

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

Fewer Letters More Letters

LVM (Linux) Logical Volume Manager for filesystem mapping

NFS Network File System, a Unix-based file-sharing protocol known for performance and efficiency

SMTP Simple Mail Transfer Protocol

ZFS Solaris/Linux filesystem focusing on data integrity

[Thread #902 for this sub, first seen 1st Aug 2024, 21:25] [FAQ] [Full list] [Contact] [Source code]

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Fewer Letters	More Letters
LVM	(Linux) Logical Volume Manager for filesystem mapping
NFS	Network File System, a Unix-based file-sharing protocol known for performance and efficiency
SMTP	Simple Mail Transfer Protocol
ZFS	Solaris/Linux filesystem focusing on data integrity

MaximilianKohler , 5 hours ago

I ran into a similar problem with snapshots of a forum and email server – if there are scheduled emails when you take the snapshot they get sent out again if you create a new test server from the snapshot. And similarly for the forum.

I’m not sure what the solution is either. The emails are sent via an SMTP so it’s not as simple as disabling email (ports, firewall, etc.) on the new test server.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

butitsnotme , 5 hours ago

I don’t bother stopping services during backup, each service is contained to a single LVM volume, so snapshotting is exactly the same as yanking the plug. I haven’t had any issues yet, either with actual power failures or data restores.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

avidamoeba OP , 4 hours ago (edited 1 hour ago)

And this implies you have tested such backups right?

Side Q, how long do those LVM snapshots take? How long does it take to merge them afterwards?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

butitsnotme , 1 hour ago

Yes, I have. I should probsbly test them again though, as it’s been a while, and Immich at least has had many potentially significant changes.

LVM snapshots are virtually instant, and there is no merge operation, so deleting the snapshot is also virtually instant. The way it works is by creating a new space where the difference from the main volume are written, so each time the application writes to the main volume the old block will be copied to the snapshot first. This does mean that disk performance will be somewhat lower than without snapshots, however I’ve not really noticed any practical implications. (I believe LVM typically creates my snapshots on a different physical disk from where the main volume lives though.)

You can my backup script here.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

avidamoeba OP , 13 minutes ago

Oh interesting. I was under the impression that deletion in LVM was actually merging which took some time but I guess not. Thanks for the info!

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

solrize , 5 hours ago

Stop the whole VM during snapshots.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

avidamoeba OP , 5 hours ago

Not a VM. Consider the service just a program running on the host OS where either the whole OS or just the service data are sitting on ZFS or LVM.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

null , 5 hours ago

This is one of the reasons Docker exists.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

avidamoeba OP , 5 hours ago

And I’m using Docker, but Docker isn’t helping with the stopping/running during backup conundrum.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Hansie211 , 4 hours ago (edited 4 hours ago)

It should work that way. If you use the recommended Docker Compose scripts for immich, you’ll notice that only a few volumes are mounted to store your data. These volumes don’t include information about running instances. If you take snapshots of these volumes, back them up, remove the containers and volumes, then restore the data and rerun the Compose scripts, you should be right where you left off, without any remnants from previous processes. That’s a pro of container process isolation

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

null , 3 hours ago

Why not?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

avidamoeba OP , 2 hours ago (edited 1 hour ago)

Docker doesn’t change the relationship between a running process and its data. At the end of the day you have a process running in memory that opens, reads, writes and closes files that reside on some filesystem. The process must be presented with a valid POSIX environment (or equivalent). What happens with the files when the process is killed instantly and what happens when it’s started afterwards and it re-reads the files doesn’t change based on where the files reside or where the process runs. You could run it in docker, in a VM, on Linux, on Unix, or even Windows. You could store the files in a docker volume, you could mount them in, have them on NFS, in the end they’re available to the process via filesystem calls. In the end the effects are limited to the interactions between the process and its data. Docker cannot remove this interaction. If it did, the software would break.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...