Blog

HOWTO: Automated, encrypted, incremental backups on Linux

I recently decided that I was going to get one small corner of my computing life in order. Yes, my home directory was a mess. Yes, I had way too much stuff sitting around unorganised. Yes, I had about 3 previous generations of hard disk sitting in /usr/disk because I hadn't been bothered to suit through the bits I need and delete the stuff I didn't. I mean, hey - I might really need my NVidia drivers from 5 years ago one day, you know?

Anyway, I decided that I was going to stop living on the edge and get an automatic nightly backup. My previous backup strategy was "Burn stuff to DVD when I remember". I could just about fit all my documents, mail, source code, and other stuff onto one DVD. But my music wasn't backed up (though not such a big deal, as it was on my iPod and I've got the original CDs) and my photos were not fantastically well backed up. This was the clincher: I'd recently bought two 4 gigabyte CompactFlash cards for use with my camera. The upshot of this was that I often had shoots that were > 4Gb - often as much as 8Gb. That won't all fit on one DVD so backing them up was a pain. Finally, backing up to DVD was something I only did once every month or so and DVDs aren't all that permanent anyway. It would be just my luck to suffer a hard disk crash and then find all my backups are unreadable.

Here's what I needed from the new system:

  • Automatic - I shouldn't have to do anything. If it relies on my involvement it will never get done.
  • Secure - Not a massive concern in the real world, but it adds to the psychological feeling of security. I want to know that if someone nicks my backup it's useless to them - they don't get all my cached passwords, financial records, and naked pictures of my girlfriend.
  • Incremental - More of a nice-to-have than a requirement, an incremental backup lets you go back to earlier snapshots if you want to. Rather than taking full backups every day (which would require huge amounts of space) you just store the bits that have changed, although it looks to the system like each one is a full backup.

Step 1: Assumptions and Requirements

I run Debian Linux on my PC so this is written from that perspective, but this should work fine on just about any Linux machine.

In hardware terms you just need a spare hard disk. I got a 500GB External HD which I'm using via USB. The drive needs to be slightly bigger than the data you're backing up. I have 400Gb of data drives in my machine, so 500Gb is ample. Personally I chose to back up the entire system, though some people consider that overkill and just back up /home (where all your documents and personal settings are stored). I also like to have a backup of /etc for my system settings, /root for root's home directory (I have a couple of sysadminny scripts in there, some environment settings, etc), and I've also got a /data partition which is where all my photos end up. So backing up just /home wasn't for me. Rather than figure out what bits I wanted to back up and which bits I could ignore, I just backed everything up.

In software terms you need rsync and LUKS:

apt-get install rsync cryptsetup

Step 2: Encrypt and mount the disk

I found this article useful when trying to figure this out the first time around.

  1. Your first step is to figure out where your disk is. If it's an IDE drive you'll find the device on /dev/hdb or similar, and the partition on /dev/hdb1. If you need to partition the drive then try using cfdisk (my favourite, although not always available) or fdisk.

    My drive's USB, which means it lives on /dev/sda1, but it's better to use one of the links in /dev/disk/. These are automatically created symlinks to devices so you don't have to worry about what order you plug stuff in. This doesn't really matter - we're only going to be referring to the device once - but it's good to know.

    Be very careful when choosing your device, as we're about to torch all the information on it. Set it up as an encrypted LUKS container like this:

    cryptsetup luksFormat /dev/disk/by-label/My_Book

    It'll ask you if you're sure you want to destroy the data on this drive, then prompt you for a passphrase. Try and pick a good one.

  2. Now we open our newly encrypted device and make a filesystem on it. The first command creates a mapped device on /dev/mapper/ and the second sets up a filesystem - I've gone with Ext3, but you can pick your favourite.

    cryptsetup luksOpen /dev/disk/by-label/My_Book crypto_backup
    mkfs.ext3 /dev/mapper/crypto_backup
  3. That's it - you've now got an encrypted drive ready to roll.

    mkdir /mnt/backup
    mount /dev/mapper/crypto_backup /mnt/backup

NB. If you have one of these Western Digital drives, you may find that they don't let you mount them by label after you've encrypted them. Use /dev/disk/by-id instead if this is an issue for you.

Step 3: Set up the backup

I found this article to be a very thorough explanation of backing stuff up using rsync. It may be useful if you want the mucky details.

  1. Perform the initial backup (this is all on one line):

    rsync -av --exclude=/media --exclude=/media --exclude=/mnt / /mnt/backup/backup.0

    The -av puts rsync into archive mode (which makes it mirror things like file permissions, copying symlinks as symlinks, etc), and makes it verbose so it tells you what's going on. The excludes keep it from trying to back up itself, my Ipod if I leave it plugged in overnight, or any CDs I have in the drive. The final two arguments are the source and destination - in this case the filesystem root / and the backup.0 folder in /mnt/backup. That's it - you're all backed up.

  2. Let's create the backup script that performs the incremental backups. Create this, change the BACKUP_ROOT and BACKUP_SOURCE lines to suit and stick it somewhere safe - /root/bin/backupscript for instance:

    #!/bin/bash
    BACKUP_ROOT=/mnt/backup
    BACKUP_SOURCE=/

    rm -rf $BACKUP_ROOT/backup.3
    mv $BACKUP_ROOT/backup.2 $BACKUP_ROOT/backup.3
    mv $BACKUP_ROOT/backup.1 $BACKUP_ROOT/backup.2
    mv $BACKUP_ROOT/backup.0 $BACKUP_ROOT/backup.1
    # This next bit should all be on one line!
    rsync -ua --delete --exclude=/media --exclude=/mnt --link-dest=$BACKUP_ROOT/backup.1 $BACKUP_SOURCE $BACKUP_ROOT/backup.0

    First of all this removes the oldest backup (4 days ago). Then it ages all the backups by one, before backing up the system. After it's run you'll find you've got 4 days worth of backups, but it only takes up a bit more space than one backup. How does that work?

    The magic is in the --link-dest parameter. To explain this I'll have to get a bit abstract, so bear with me. When you see a list of files in a directory, you're not actually seeing the files - you're seeing a bunch of links to certain areas of a disk. You can have more than one link to the same area of disk, and it'll look to your computer like you've got the same file in several places - but really the file's only stored once on the disk itself. When you remove the file you're actually removing a link, and it's only when there's no links left that the computer will use the space for something else.

    By way of analogy consider addresses for houses. You could post stuff to "42 Prudence Avenue, The Village" and it'll arrive at your house - but you could also post stuff to "The Old Windmill, The Village" or "White Cottage, Prudence Avenue, The Village" and it will all arrive. This doesn't mean you've got three houses, it's just three different ways of pointing to them. And it's only when all of them are removed from the council's list of ratepayers that they come and repossess your house.

    Returning to the topic at hand: the --link-dest argument performs a little magic and makes links to files in the given directory if the file hasn't changed. So it looks like you've got 4 copies of a file - one in each day's backup - but really it's only stored once.

Step 4: Automate it

  1. First up you need to edit root's crontab so that the backup runs nightly. su to root then run crontab -e and add the following line:

    # m h dom mon dow command
    0 3 * * * /root/bin/backupscript

    This tells cron to run your backupscript every day at 3AM. Make sure your backup script is executable! chmod +x /root/bin/backupscript

  2. Now we need to make sure that the encrypted container is mapped at boot, and mounted automatically for us. Add a line to /etc/crypttab so it looks something like this:

    # <target name> <source device> <key file> <options>
    crypto_backup /dev/disk/by-id/usb-WD_5000AAJS_Externa_123-part1 none luks

    This tells the system that the /dev/disk/by-id/usb-WDetc device should be mapped to /dev/mapper/crypto_backup on startup as a LUKS container. As no key file is specified it will prompt for a password on bootup.

  3. Now we've told the system how to map the device we can just add it to /etc/fstab like anything else:

    # <file system> <mount point> <type> <options> <dump> <pass>
    /dev/mapper/crypto_backup /mnt/backup ext3 defaults 0 2

Limitations

  • If you're using this to backup a database (eg. MySQL) you are entering a world of pain. MySQL generally stores its databases in a whole bunch of files. To take a snapshot of a database from the filesystem you'd have to stop the database entirely, copy all the files, then start it again - and that's not a supported way of doing things. Otherwise you're just crossing your fingers that nothing changes the database while you're trying to back it up (either a user of the database like a website or a program, or MySQL itself deciding now's a good time to reindex or whatever). Rsync will make sure that your files are not changed during the backup, but it doesn't do it for an entire directory. You're better scheduling a regular backup using mysql_dump or similar to somewhere on your hard disk, and letting those backups get copied by rsync instead.
  • You are not totally safe. If this is your home computer this is a backup regime that you are justified in feeling smug about. If you're an investment bank or a medical facility you should be panicking. What if there's a fire? A flood? An explosion? What happens if your backup is scheduled at 3AM but your hard disk dies at 2:55? A home user can cope with this level of risk, but anyone whose data really truly matters (ie. people will lose lots of money/limbs if data is lost) should have a proper disaster recovery plan, offsite backups, replication, etc.
  • You're trusting people not to destroy your backups. Generally the setup above will let anyone read or write the backups on the system if they have permissions to do so. Instead of setting the mountpoint to 755, you may want to run chmod 700 /mnt/backup. This means that only root can read or write the backup - but this may not be convenient. There is a solution but I decided it was overkill for my needs - I'm the only one using my PC and I trust myself not to destroy it all.
  • This is not a versioning system. The snapshots let you go back a short way in time but think of this as a convenience thing ("Oh no! I really didn't want to delete that document two days ago!") rather than as a way of applying versioning. Use Subversion or CVS for this instead.

Subscribe via RSS
Follow me on Twitter

browse by date

  • August 2007