blog-content/2024-01-19_Backup-and-Restore.md

80 lines
4.7 KiB
Markdown
Raw Permalink Normal View History

2024-01-20 20:54:13 -08:00
---
date: 2024-01-20
title: Recovering from a Backup
tags:
- homelab
- truenas
- backup
- restoration
---
As part of my moving data and hard drives around between servers, I have reached
a situation where I need to move data off of my primary storage server, destroy
the array, and restore my data. Incidentally, this is basically simulating what
I will do in the event my storage server or ZFS array fails. I'll document the
process here to serve as a reference for what to do when upgrading the pool in
the future or in case too many drives fail to rebuild the array normally.
## Save any Encryption Keys
This should be done as part of setting up TrueNAS datasets, since without the keys
any encrypted datasets are inaccessible. Even if the array is fine, the data on it
will be inaccessible if TrueNAS fails or gets re-installed without first backing up
the encryption keys. This is [documented](https://www.truenas.com/docs/scale/22.12/scaleuireference/storage/datasets/encryptionuiscale/#pool-encryption),
so just be sure to export all of the keys and put them somewhere safe (NOT anywhere
on the TrueNAS filesystem). For me, this means a copy saved to my laptop and another
copy on an SSD I keep in a fireproof safe.
## Configure Replication
I talked about replication tasks [in a previous post](https://blog.mcknight.tech/2024/01/18/NAS_Setup/#Data-Protection),
but the important thing here is to make sure the entire array is replicated, or at
least the data you care to restore. Since this is a planned migration for me, I
created a manual snapshot of my whole dataset and ran the replication manually to
be sure I have a completely up-to-date copy of my data before destroying the old
array. Make sure that manual backup is included in replication; in my case, I had
configured only "auto" snapshots to replicate, so I updated the task to include this
manual one. I also checked the `Full Filesystem Replication` box so that I can mount the
replicated dataset on my target device.
## Mount and Validate the Replica
I don't consider this step optional, go to the machine that received the snapshots and
mount every dataset. If you have encrypted datasets, the keys will need to be imported;
I did this manually with [`zfs-load-key`](https://openzfs.github.io/openzfs-docs/man/master/8/zfs-load-key.8.html)
on the backup system. After mounting, I spot check that recently modified files were
present and then decide I am satisfied that all of my data is safe.
### Update References to the Backup Shares
This is an optional step, but replication can take some time so it may be worthwhile to
mount the backup shares and update any containers or clients to reference the backup.
Usually, a backup should be read-only but in this scenario the "live" data is about to
be wiped out, so I mounted my backup as read-write.
## Perform Restoration
On the system being restored to, disable any scheduled replication tasks and then create
the VDEV(s) to restore to. Configure a replication task that is essentially the one used
to create the backup, but with source/destination reversed. If a manual backup was created
as in my case, make sure it is included in the restored snapshots. Also, if the backup has
any changes, a manual snapshot will need to be taken or else changes synced back (i.e.
via rsync) after the replication from the backup to storage server is complete.
## Validate Restored Data
After the replication back to the primary storage server is complete, make sure encryption
keys are all present, data is available and up-to-date, and permissions are correct.
It will likely be necessary to restore encryption keys, but the TrueNAS web UI makes it
easy to unloack shares with downloaded backup keys.
If everything looks good, then proceed to re-configuring shares and enabling/re-configuring
backups as appropriate.
## Complete Restoration
With all of the data restored, re-enable any disabled services and backup tasks. Note that after
re-enabling services, each individual share will have to be re-enabled. The replication task to
restore data can be disabled but I prefer to remove it entirely so it isn't accidentally run.
I found that permissions were preserved so there was nothing left to do; at this point, my TrueNAS
instance is in the same state it was before, just with an updated pool configuration.
## Some Random Notes
I have both machines involved here connected through a 10Gig switch and they transferred data at
up to around 4Gbps. At those speeds, my local backup restoration took about 12 hours; backups are
less bandwidth-constrained since only changes have to be transferred. I eventually want to have
an offsite backup; based on my local backup/restore experience, this would actually be doable within
a day or two, assuming my remote backup location also has gigabit or better upload speeds.