Backups, tools, rsync, and recovery
Backups, tools, rsync, and recovery
Backups, tools, rsync, and recovery – Chapter 10.3
This is the 12th of the free articles directly taken from the Manjaro Linux User Guide book, available at https://www.amazon.com/dp/B0C4PSWRQS/. The full list of freely available articles is here: Manjaro Linux User Guide – For newbies, fans, and mid users. More information at the end of the article.
Read time: 4 minutes. Previous article: 9.3 The Linux Directory Structure. Next article: 11.1 Network Basics With Manjaro Linux.
In this section, we will review what a backup is and what to consider when doing one. Having a basic understanding of the terms when choosing a backup tool and strategy is essential.
What is a backup?
A backup is a copy of specific data in a separate location – on another hard disk (HDD), a separate partition, cloud storage, or a remote server. A backup ensures that any files will have a copy if your system suffers a FS or SW failure. These can be any long-term projects, documents, SW applications, art, and so on. While recovering the OS is easy via reinstallation, lost family photos may be unrecoverable.
We can even back up the whole OS with its configuration, the installed SW, and our personal files. However, backing up an installed OS or making a snapshot of a whole partition makes sense for servers. Considering that in case of a failure, Manjaro can be reinstalled in 15 minutes, we need to back up only our personal data and potentially some custom configurations.
Making a partition snapshot or a full OS backup requires significant space and is often slow. In addition, if we don’t do periodic refreshes of such backups, their state will get older with time, and recovering them will put us in some old state of the system. This doesn’t make sense for a Rolling Release distribution.
The types of backup actions available for us, regular users, are as follows:
- Full backup – this is to make a full copy of a given directory, which is a slower task. It requires as much space as the amount of data to be backed up.
- A backup update – once we have made one backup, we only add to it with changes. The required space corresponds to the changes in size, being a matter of a second for a few tens of files. After all, we rarely change thousands of files in a single day.
—– —– —– —– —–
Deciding what to back up
Most tools don’t support compression as they preserve the full directory and file attributes, some including the extended ones, as explained in Chapter 9. Thus, to back up 30 GB of /home data, you will need 30 GB of space.
If you have large files you know don’t need to be backed up, you should explicitly exclude them to save space and time. A good example is two virtual machines I use for experiments – each takes up over 25 GB of space, and I don’t need to back them up. Steam games can also be quite big, and considering that you can download them again (and that they need periodic updates managed by the Steam application itself), backing up 40 GB just for one game can be pointless and slow. The GUI application Filelight is excellent for examining your data sizes, and the recommended tools support manually selecting the directories to back up.
I will just add that I store old data, big photos, and archive documents on a Synology NAS with RAID, as they are over 2 TB. I work with a 20 GB directory backed up daily.
—– —– —– —– —–
Backing up SW, rsync, and recovery
There are many backup tools; we will review the best available and list potential alternatives. Most are terminal-based, but several also provide a GUI, so making a backup is a matter of a single terminal command or a few clicks in a GUI. The only thing you need to perform a backup is your additional storage.
The reason to list the tools given here is that these are considered mature and good by many users – for each of them, I have found tens of fresh forum posts of people using them for a long time and across multiple distributions. I will not list the 70+ links here.
—– —– —– —– —–
The best tools
rsync is one of the best terminal-based fast, reliable, and versatile remote and local file-copying tools. Written in C, it is an effective single-threaded SW. The project is mature (developed since 1996) and so good that many GUI and other tools are based on it. One of the strengths of rsync is its special optimized checks, so updating existing backups with changes is fast and effective. Transferring cloud/network backups is also fast, as the data is compressed for the transfer. After making a backup with it, you can browse the files in the backup location.
It is preinstalled on Manjaro and many other distributions. Once you define the correct command-line interface (CLI) arguments (reviewed later in the chapter), you can put the command in a script and trigger it manually or automatically. Scriptwriting basics and automatic triggering are reviewed in Chapter 15. From all my tests, this is the only tool that never failed for a whole system backup.
Back In Time is a GUI and CLI rsync-based tool. Its GUI is one of the richest. You must explicitly start the root GUI version from the main menu to perform a full system backup. It has never failed me when backing up personal files. Unfortunately, it is not preinstalled on Manjaro. Installing SW on a Manjaro live USB is very hard, so you can’t use it for system recovery from there. On the other hand, it is excellent for use on a freshly installed Manjaro to recover personal directories from a backup. It is my only and strongest recommendation for a GUI tool.
—– —– —– —– —–
Other GUI tools
Timeshift is a GUI- and terminal-based tool preinstalled on Manjaro. Apart from rsync backups, it also supports btrfs snapshots for btrfs-formatted partitions. It has a simple GUI, can do periodic automatic jobs, and is famous and widely used. However, its simplistic GUI starts with a setup wizard, which lacks custom directories selection. Even after I found and configured the custom directories settings, Timeshift tried to make a full system backup. This tool failed me more than five times for a whole system backup.
deja-dup is a simple GNOME-based GUI for duplicity and rsync. It failed for me once, and I’m not too fond of its lack of options compared to Back In Time. Despite this, some people use and trust it. grsync is another GUI for rsync, but it is also limited compared to the others.
—– —– —– —– —–
Other CLI tools
rclone is one of the most famous CLI tools optimized especially for synchronization with cloud storage. Its supported features depend widely on the features supported by the remote location. Inspired by rsync, it supports compression and is widely used and recommended.
The next five tools are good and worth checking out for advanced users only.
duplicity is a CLI tool based on rsync. It is widely used and supports multiple cloud storage servers.
FSArchiver supports compression and recovery on a different partition and FS.
restic supports a lot of cloud services and explicitly involves differential backups.
borg, with the full name BorgBackup, is a great terminal tool that optionally supports compression and authenticated encryption. It can backup securely over the internet via SSH, supports resuming backups and deduplication, and has many other features.
Kopia is another excellent tool offering compression, incremental backups, and deduplication like borg, and it can even serve as a frontend for some of the cloud storage options supported by rclone. For this, rclone also needs to be installed.
—– —– —– —– —–
Test results
I tested all the GUI tools and rsync. I also tested a whole system backup, as this is the ultimate test for a backup tool. Of all the tools, rsync was the only one that never failed for me. Considering that it is preinstalled on Manjaro by default and is also the tool Calamares uses for Manjaro installation, I guess you see why it is the best for our purpose.
From the GUI tools, the only good one was Back In Time. Its only disadvantage is that it is not preinstalled, so you cannot use it from a Manjaro live boot. The preinstalled Timeshift failed many times.
—– —– —– —– —–
Common points
Remember that no matter which tool you choose, you must test it. This means making a backup and restoring it on your machine, ensuring everything works correctly. If we don’t know it works correctly, how can we even consider making backups with it? One of the essential characteristics of rsync is that you can browse the files after the backup. While I can guarantee the quality of rsync based on over 100 tests, one incorrect argument may corrupt a backup.
System backups are complicated as they involve backing up the root directories and special files. As they are complicated to handle (and I have tested this over 50 times for this chapter), doing a system backup for a regular user system is pointless. They are also slow due to the number of files. As a result, if an unrecoverable system failure occurs, it is much easier to reinstall the OS and recover only your personal files. In addition, except if you are experimenting heavily with Manjaro, its regular usage has never led to OS corruption for me. Finally, for common failures, there are thousands of posts in the forum, and they are typically easy to solve, so don’t rush into re-installation if you have issues with Manjaro.
How can we test a simple rsync directory backup? Easy – open the Filelight GUI application and inspect the backed-up directory and its backup copy. If they are exactly the same size, you copied all the contents. Then, open the backup copy with your file manager and inspect the files, particularly the most recently changed ones. If the backup is successful, the latest modified dates will make sense. Open some of them, and that’s all. Once you know your backup command is working, keep it in a script or a simple text file so you can execute it directly next time.
—– —– —– —– —–
rsync
Using rsync via the terminal is simple. Before starting the following commands, ensure you own the already mounted BKP directory. Otherwise, your backup will not work, and you will get no warning. Check this with $ ls -l or $ exa -lah.
Reading the full manual is overwhelming but can help a lot in special cases. However, I recommend the many nice short guides on the web, just like mine here. The general form of using rsync is as follows:
$ rsync [-SingleFlags] [--SpecialFlags] [BackupFromDir] [--Options] [DestinationDir]
For a regular backup, use the following command but replace /home/YourUser/SomeDirectory/* and the other paths with yours as necessary, and also read the explanations that follow. Keep in mind that the whole command goes on one line, but as it is long, it is spread over three here:
$ rsync -aAXHv --progress --delete /home/YourUser --exclude={"lost+found", "/home/YourUser/SomeDirectory/*"} /home/YourUser/BKP/BKP_home > BKP_18_Oct_2023.log
Let’s split and explain the command.
-aAXHv – are the single flag options –
- a for all files and a recursive copy of subdirectories, preserving symlinks, permissions, the modification time, group ownership, the owner, and device files;
- A for preserving Access Control Lists (ACLs);
- X for preserving extended attributes;
- H for preserving hard links; and
- v for verbose, to display the progress details (optional); once you have proven your command works correctly, you don’t need the extra verbose report.
Here are the other options:
- –progress displays per file the operation time left in detail and, if the process breaks, where this happened. Again, once you have proven your command works correctly, you don’t need the extra report and can skip it.
- –delete – tells rsync to explicitly delete from the archive location files removed from the backed-up local directory. In other words, your backup will be cleaned from old deleted files.
- –exclude and the big string in curly braces {} list directories to exclude. This can be skipped if not necessary.
/home/YourUser/BKP/BKP_home is the destination, which I have mounted in /home/luke/BKP. As I want each backup to be in a separate directory, I’ve made a subdirectory BKP_home for this one.
> BKP_18_Oct_2023.log is the terminal redirection operator > (explained in Chapter 7) followed by a file name to put the whole log in a text file and not print on the terminal. This is also optional.
v and –progress make the execution a lot slower, as they print an enormous amount of information in the terminal. If the terminal does not have set up an unlimited or at least large scrollback size (100K lines), you will lose it. Due to this, I strongly recommend redirecting to a log file if you use any of the two report options. The log is also good if you need to check results later. We don’t need a log for regular personal data most of the time, as checking the overall result is done quickly with Filelight.
For me, backing up 4.2 GB (over 50,000 files) without the v and –progress flags on one of my machines took less than a minute the first time. Each next update took less than a second for small changes. Of course, this depends on the amount and type of files, storage drive characteristics, currently running processes, and your machine’s characteristics. Thus, please don’t take it for reference.
—– —– —– —– —–
To back up your whole /home directory, you can use this version (replace the bolded paths and potentially remove the unnecessary parts, and again consider here we have the line spread on three lines):
$ rsync -aAXHv --progress --delete /home/ --exclude={"/lost+found", "lost+found",".cache",".VirtualBoxVMs","/home/luke/BKP","/home/luke/BKP5"} home/BKP/BKP_OF_HOME > BKP.log
—– —– —– —– —–
For advanced usage: If you want to back up a whole system, you need sudo, and it becomes a bit more complicated (again, one line, here it is on five!):
$ sudo rsync -aAXHv --progress --delete / --exclude={"/dev/*","/proc/*", "/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/var/tmp/*","/var/cache/*", "/lost+found","swapfile","lost+found",".cache","Downloads",".VirtualBoxVMs", ".ecryptfs","/home/YourUser/BKP","/home/YourUser/BKP5/*"} /home/YourUser/BKP > BKP.log
For the last command, having your /home directory on a separate partition is not a problem. As long as it is mounted, it will be copied as well. What is essential is to remember to exclude large files and directories of your choice and the listed system directories (/dev,/proc, etc.).
Once your version of the arguments is ready, save it in a file, and it is strongly advised to also copy them in a backup location (e.g., an email or some form of cloud storage).
—– —– —– —– —–
Recovering with rsync
To recover with rsync, simply revert the source and destination with only one exclusion:
$ sudo rsync -aAXHv --progress –delete /home/YourUser/BKP/BKP1 --exclude="lost+found" /home/YourUser > BKP.log
Important note – rsync is a highly optimized copying tool; it copies files from one directory to another. You can selectively copy/back up anything you want. This program is also used to synchronize data or selectively copy files from hosts to servers and vice versa (including for Unix, BSD, and other Unix-like OSs).
—– —– —– —– —–
Next article: 11.1 Network Basics With Manjaro Linux.
* * * * *
You can subscribe for news, discounts, and giveaways HERE. Triple Helix Guarantees 100% privacy and will never provide your data to third parties. We keep the right to inform you on some of the news for our open-source projects, which will be rare. You can unsubscribe at any moment.
You can also follow me for such news on LinkedIn, where the giveaways will take place. I share development news, guides, and helpful content.
All rights reserved. Parts of this free content are allowed to be cited only when the official link to this article is provided as a source of the information, the author’s name is mentioned, as well as the publisher and the book name. Example: “Cited from the article <insert_link> by Atanas Georgiev Rusev, as part of the Manjaro Linux User Guide book, by PACKT publishing. All rights reserved”.