Merge branch 'BackupAndRestore' into 'main'

Document Backup and Restore See merge request d_mcknight/blog-content!3
2024-01-21 04:54:13 +00:00 · 2024-01-21 04:54:13 +00:00 · 8cc63bdd4c
commit 8cc63bdd4c
parent 8215ec3bd0 e24a4fe4a7
2 changed files with 80 additions and 0 deletions
--- a/2024-01-18_NAS_Setup.md
+++ b/2024-01-18_NAS_Setup.md
--- a/2024-01-19_Backup-and-Restore.md
+++ b/2024-01-19_Backup-and-Restore.md
@ -0,0 +1,80 @@
+---
+date: 2024-01-20
+title: Recovering from a Backup
+tags: 
+  - homelab
+  - truenas
+  - backup
+  - restoration
+---
+As part of my moving data and hard drives around between servers, I have reached
+a situation where I need to move data off of my primary storage server, destroy
+the array, and restore my data. Incidentally, this is basically simulating what
+I will do in the event my storage server or ZFS array fails. I'll document the
+process here to serve as a reference for what to do when upgrading the pool in 
+the future or in case too many drives fail to rebuild the array normally.
+
+## Save any Encryption Keys
+This should be done as part of setting up TrueNAS datasets, since without the keys
+any encrypted datasets are inaccessible. Even if the array is fine, the data on it
+will be inaccessible if TrueNAS fails or gets re-installed without first backing up
+the encryption keys. This is [documented](https://www.truenas.com/docs/scale/22.12/scaleuireference/storage/datasets/encryptionuiscale/#pool-encryption),
+so just be sure to export all of the keys and put them somewhere safe (NOT anywhere
+on the TrueNAS filesystem). For me, this means a copy saved to my laptop and another
+copy on an SSD I keep in a fireproof safe.
+
+## Configure Replication
+I talked about replication tasks [in a previous post](https://blog.mcknight.tech/2024/01/18/NAS_Setup/#Data-Protection),
+but the important thing here is to make sure the entire array is replicated, or at 
+least the data you care to restore. Since this is a planned migration for me, I 
+created a manual snapshot of my whole dataset and ran the replication manually to
+be sure I have a completely up-to-date copy of my data before destroying the old
+array. Make sure that manual backup is included in replication; in my case, I had
+configured only "auto" snapshots to replicate, so I updated the task to include this
+manual one. I also checked the `Full Filesystem Replication` box so that I can mount the
+replicated dataset on my target device.
+
+## Mount and Validate the Replica
+I don't consider this step optional, go to the machine that received the snapshots and
+mount every dataset. If you have encrypted datasets, the keys will need to be imported;
+I did this manually with [`zfs-load-key`](https://openzfs.github.io/openzfs-docs/man/master/8/zfs-load-key.8.html)
+on the backup system. After mounting, I spot check that recently modified files were
+present and then decide I am satisfied that all of my data is safe.
+
+### Update References to the Backup Shares
+This is an optional step, but replication can take some time so it may be worthwhile to
+mount the backup shares and update any containers or clients to reference the backup.
+Usually, a backup should be read-only but in this scenario the "live" data is about to
+be wiped out, so I mounted my backup as read-write.
+
+## Perform Restoration
+On the system being restored to, disable any scheduled replication tasks and then create
+the VDEV(s) to restore to. Configure a replication task that is essentially the one used
+to create the backup, but with source/destination reversed. If a manual backup was created
+as in my case, make sure it is included in the restored snapshots. Also, if the backup has
+any changes, a manual snapshot will need to be taken or else changes synced back (i.e.
+via rsync) after the replication from the backup to storage server is complete.
+
+## Validate Restored Data
+After the replication back to the primary storage server is complete, make sure encryption
+keys are all present, data is available and up-to-date, and permissions are correct. 
+It will likely be necessary to restore encryption keys, but the TrueNAS web UI makes it
+easy to unloack shares with downloaded backup keys.
+
+If everything looks good, then proceed to re-configuring shares and enabling/re-configuring
+backups as appropriate. 
+
+## Complete Restoration
+With all of the data restored, re-enable any disabled services and backup tasks. Note that after
+re-enabling services, each individual share will have to be re-enabled. The replication task to
+restore data can be disabled but I prefer to remove it entirely so it isn't accidentally run.
+
+I found that permissions were preserved so there was nothing left to do; at this point, my TrueNAS
+instance is in the same state it was before, just with an updated pool configuration.
+
+## Some Random Notes
+I have both machines involved here connected through a 10Gig switch and they transferred data at
+up to around 4Gbps. At those speeds, my local backup restoration took about 12 hours; backups are
+less bandwidth-constrained since only changes have to be transferred. I eventually want to have 
+an offsite backup; based on my local backup/restore experience, this would actually be doable within
+a day or two, assuming my remote backup location also has gigabit or better upload speeds.