blog-content/2024-01-19_Backup-and-Restore.md

4.7 KiB

date title tags
2024-01-20 Recovering from a Backup
homelab
truenas
backup
restoration

As part of my moving data and hard drives around between servers, I have reached a situation where I need to move data off of my primary storage server, destroy the array, and restore my data. Incidentally, this is basically simulating what I will do in the event my storage server or ZFS array fails. I'll document the process here to serve as a reference for what to do when upgrading the pool in the future or in case too many drives fail to rebuild the array normally.

Save any Encryption Keys

This should be done as part of setting up TrueNAS datasets, since without the keys any encrypted datasets are inaccessible. Even if the array is fine, the data on it will be inaccessible if TrueNAS fails or gets re-installed without first backing up the encryption keys. This is documented, so just be sure to export all of the keys and put them somewhere safe (NOT anywhere on the TrueNAS filesystem). For me, this means a copy saved to my laptop and another copy on an SSD I keep in a fireproof safe.

Configure Replication

I talked about replication tasks in a previous post, but the important thing here is to make sure the entire array is replicated, or at least the data you care to restore. Since this is a planned migration for me, I created a manual snapshot of my whole dataset and ran the replication manually to be sure I have a completely up-to-date copy of my data before destroying the old array. Make sure that manual backup is included in replication; in my case, I had configured only "auto" snapshots to replicate, so I updated the task to include this manual one. I also checked the Full Filesystem Replication box so that I can mount the replicated dataset on my target device.

Mount and Validate the Replica

I don't consider this step optional, go to the machine that received the snapshots and mount every dataset. If you have encrypted datasets, the keys will need to be imported; I did this manually with zfs-load-key on the backup system. After mounting, I spot check that recently modified files were present and then decide I am satisfied that all of my data is safe.

Update References to the Backup Shares

This is an optional step, but replication can take some time so it may be worthwhile to mount the backup shares and update any containers or clients to reference the backup. Usually, a backup should be read-only but in this scenario the "live" data is about to be wiped out, so I mounted my backup as read-write.

Perform Restoration

On the system being restored to, disable any scheduled replication tasks and then create the VDEV(s) to restore to. Configure a replication task that is essentially the one used to create the backup, but with source/destination reversed. If a manual backup was created as in my case, make sure it is included in the restored snapshots. Also, if the backup has any changes, a manual snapshot will need to be taken or else changes synced back (i.e. via rsync) after the replication from the backup to storage server is complete.

Validate Restored Data

After the replication back to the primary storage server is complete, make sure encryption keys are all present, data is available and up-to-date, and permissions are correct. It will likely be necessary to restore encryption keys, but the TrueNAS web UI makes it easy to unloack shares with downloaded backup keys.

If everything looks good, then proceed to re-configuring shares and enabling/re-configuring backups as appropriate.

Complete Restoration

With all of the data restored, re-enable any disabled services and backup tasks. Note that after re-enabling services, each individual share will have to be re-enabled. The replication task to restore data can be disabled but I prefer to remove it entirely so it isn't accidentally run.

I found that permissions were preserved so there was nothing left to do; at this point, my TrueNAS instance is in the same state it was before, just with an updated pool configuration.

Some Random Notes

I have both machines involved here connected through a 10Gig switch and they transferred data at up to around 4Gbps. At those speeds, my local backup restoration took about 12 hours; backups are less bandwidth-constrained since only changes have to be transferred. I eventually want to have an offsite backup; based on my local backup/restore experience, this would actually be doable within a day or two, assuming my remote backup location also has gigabit or better upload speeds.