News Flash

Great article by @NeutronUK on how to create a print stylesheet using Firebug and the Web Developer Toolbar - http://cot.ag/bOQiVM

Blog:

4 February 2010

Bulletproof backups for MySQL

By Chris Lea

You have all your important data backed up, right? I mean, come on folks, it’s 2010. We may not have flying cars yet like we’re supposed to, but “having backups” is a problem that’s solved. And I’m sure you’ve solved it. I’m sure you sleep well at night knowing that if a comet hit the data center where your website lives, and everything was completely destroyed, you could get back up and running somewhere else quickly enough.

Except that the chance this is actually true for you, statistically, is very small. If we look at the statistics, then what we see is that people are storing increasing amounts of things, important things even, online because disks keep getting bigger and cheaper (though sadly not much faster, but that’s a different article).

But despite this availability of cheap storage, people for the most part still don’t back things up. This isn’t universally true of course, there are some folks who have implemented really brilliant backup systems at their companies.

Of course, it helps when you have an amazing systems guy on your team who is made out of lighting bolts and awesomeness, taking care of the nitty gritty details for you. If, however, you are one of the many many people who doesn’t have a good backup strategy in place, this article is here to help you out.

I’ve written a shell script (Ed: please remove .txt extension before use) that will serve as a good starting point, and we’ll talk through it in a little bit. For now though, let’s discuss backups in a more general sense so we can decide what our end goals are.

Types of things to back up

There are lots of different kinds of data that you may need to back up. Three of the most common are:

  1. Source Code
  2. Static Files (images, PDFs, etc)
  3. Data in a Database

These days, there’s a good chance you have your source code files in an RCS system somewhere, such as GitHub. If you do (and you should), and your RCS system is on a different server or servers than your site, you already have your source code backed up. If your server catches on fire, you can just do a fresh checkout onto a new server. This is one of the many advantages of using an RCS system.

Static files, such as images or music or what have you, are easy to back up. A quick search will reveal dozens of different ways to ship this sort of data to some secondary place in case of emergency.

Database data, however, can be a little more tricky. You can’t just copy your binary data files safely while the database is running because if a write happens during the process, then your “backup” will be in an inconsistent state, which will probably render it useless.

Moreover, it’s not generally the best way to do a backup because restoring data from a binary copy can be tricky. What my shell script does is allow you to automate correctly backing up a standard MySQL database, or databases, with full end-to-end encryption to an offsite location.

Is my data backed Up?

This seems like an easy question to answer. In fact, it is a rather easy question to answer, but I’ve found that many people have never bothered to answer it, and it gets them in trouble. Are you doing backups of some sort currently? If so, does that mean you have backups? Actually, no, it doesn’t. If there is one golden rule of having backups that I hope you remember after reading this, it’s the following:

If your backup procedure doesn’t include instructions for doing a restore of the data, or if your restoration procedure has not been tested, then you don’t have backups.

Simply put, unless you know that you can reliably do a restoration of your data, you don’t have backups. If you haven’t tried, you don’t know. As an example, I once encountered a setup where an administrator was “backing up” several databases by using rsync to copy the binary data store to a remote machine once a day. He thought he had backups, and he was wrong.

As I noted before, if a write happens during the rsync process, those binary files aren’t guaranteed to be in a consistent state. If he’d ever tried to do a restore of those files (he hadn’t), he would have discovered this problem. So I say again, until you have tested a restore and it’s worked, and you’ve documented how to do that restore, you don’t have backups.

Another thing that I always like to stress is that you really, really want backups that are under as much personal control as possible. This typically means using a third party or offsite service distinct from your ISP to store your backups. If you ever need to restore some data, there’s a good chance it’s because something has gone wrong with your systems.

In that case, there’s also a fair chance that something has gone wrong at your ISP itself, so you’d rather not have your backups tied to your ISP. What you want is backups that are stored by some other company, preferably in an entirely different physical data center.

Basic process for setting up database backups

The steps I use to back up the databases for Virb are as follows:

  • Get a dump of the database
  • Compress the data
  • Encrypt the data
  • Send the data over an encrypted connection to an offsite service
  • Automate the entire procedure with a cron job

It’s nothing fancy, and it shouldn’t be. Fancy, intricate things tend to break, and this is the last thing you want to worry about breaking.

Prerequisites

You’ll need to have a few things in place for this to all work. I’m assuming you’re using a reasonably new Debian style Linux distribution (which includes Ubuntu). You need to be able to use the mysqldump program to get a dump of the desired database data. This comes standard with the MySQL installations on every Linux distribution I know of.

I’m going to assume that you have a .my.cnf file set up in your home directory with a username and password that allow you to dump out the data you want. If you’ve never used a .my.cnf file before, it’s very easy. Create a file called .my.cnf in your home directory, and put the following in it:

[client]
user = dbusername
password = dbpassword

of course substituting the correct username and password for your systems. Then run chmod 600 $HOME/.my.cnf on the file so other people can’t read it. You can now use all the mysql* command line utilities without having to pass the -u and -p flags all the time.

Another assumption I’m making here is that you can safely run mysqldump at some point during the day, and this won’t mess anything on your site up, which practically means that you don’t have that much data. When you dump the database out, a write lock is placed on all the tables in that database (assuming you’re dumping out all the tables). So while the data is getting written out, reads can still happen, but no INSERTS, UPDATEs or DELETEs can.

If you have say a few hundred megabytes of binary data, usually located in /var/lib/mysql on Linux systems, you are probably okay just dumping the data out from whatever machine the database is on. On reasonably modern hardware, 300M of binary data takes about 15 seconds to dump out, for example. Having a write lock for 15 seconds in the middle of the night is not a big deal for most people.

If you have enough data that it really does take a long time, or your requirements are such that you can’t have a write lock happen even briefly, then the next option is to set up a mysql slave server that exists just for backups. The idea is that the slave is continuously replicating from the master.

When you do the backup on the slave, the write lock will happen there so that the data can get dumped out, and then once it’s finished the replication will catch back up to the master if it’s missed anything.

Explaining how to set up MySQL replication is beyond the scope of this article, but this “back up a slave” strategy is exactly what we do for Virb.

We’re going to be backing things up using the online back up service provided by IBackup. There are plenty of these sorts of services available, so if you want to use a different one feel free. Not much of what I’m going to describe is specific to IBackup. I have no particular affiliation to them, other than having used them for several projects and being pleased with their products and pricing. I’m going to assume you have an Economy plan with enough storage to hold your backups.

The only other assumptions I’l make are that you have stunnel4, rsync, and gpg installed. These are very standard, freely available programs. On reasonably recent Debian style systems (which include Ubuntu) you can install all of them by running this command as root:

apt-get update && apt-get install stunnel4 rsync gnupg

Setting up encryption

We are going to encrypt our database backups, and we’re going to transmit the encrypted files over a secure connection. This is technically overkill, but encryption is free and easy so we’ll do it anyway. The first thing you’ll need to do is generate GPG keys to use.

I’m not going to explain it here because this tutorial will take you all of five minutes to get through, and covers everything you’ll need. Just be sure to remember your passphrase, and put copies of your key files somewhere safe just in case.

You should now have GPG keys set up, and an account with IBackup, so create a file in your home directory called .ibackup with the following contents:

IBACKUP_USERNAME=username
IBACKUP_PASSWORD=password
GPG_EMAIL=your@email.com

The reason for this file will become apparent shortly. Just substitute the username and password for your IBackup account there, and put in the email address that was used to generate your GPG keys. Be sure to run chmod 600 $HOME/.ibackup so that nobody else can see your password.

Now we need to do a little configuration for stunnel4. We don’t need it to be running all the time, so shut it down and make sure it doesn’t start on boot with the following commands:

/etc/init.d/stunnel4 stop
update-rc.d -f stunnel4 remove

Now let’s configure stunnel4 for IBackup. First get the default configuration file out of the way.

cd /etc/stunnel
mv stunnel.conf stunnel.conf-orig

Next, with your text editor of choice, create a file called ibackup.conf in the /etc/stunnel directory with the following contents:

client = yes
[ibackup]
accept = 8455
connect = rsync4.ibackup.com:5000

These options are fairly specific to IBackup’s service, but other backup services have similar options. The accept option defines a local port for stunnel4 to use, so you don’t have to pick that particular one if, for some reason, it’s already in use.

Performing the backup

Since we have our security issues handled, we can now perform the backup. Open up that shell script (Ed: please remove .txt extension before use) I mentioned at the beginning in your text editor of choice, so we can go through it. The first 14 lines just get the names of databases we’re going to back up and identify the programs we’re going to use.

#!/bin/bash

# ibackup.com destination
IBACKUP_DEST=$1

# database names
DB_NAMES=$2

# binaries we need
MYSQLDUMP=/usr/bin/mysqldump
GZIP=/bin/gzip
GPG=/usr/bin/gpg
RSYNC=/usr/bin/rsync
STUNNEL=/usr/bin/stunnel4

If you’re not familiar with bash shell scripting, it’s not hard. Things like $1 and $2 indicate command line arguments that get passed to the script. So here, $1 is something we’re going to use in the name of the eventual backup file we’ll get, and $2 will hold the name(es) of the database(es) we will back up.

Things get more interesting from lines 23 to 39.

DOTIBACKUP="$HOME/.ibackup"

if [ ! -r $DOTIBACKUP ]; then
    echo "Need $DOTIBACKUP file for username and password!"
    exit 1
else
    PERMS=`/bin/ls -l $DOTIBACKUP | /usr/bin/awk '{print $1}'`
fi

if [ $PERMS != "-rw-------" ]; then
    echo "Permissions on $DOTIBACKUP are wrong!"
    echo "Should be -rw-------, actually is $PERMS"
    exit 1
fi

. $DOTIBACKUP
export RSYNC_PASSWORD=$IBACKUP_PASSWORD

These commands check that the .ibackup file exists and has the correct permissions. If so, they source the file, which is to say they absorb the things that file defines into the running shell script. Then it sets the environment variable RSYNC_PASSWORD to be the value of the IBACKUP_PASSWORD that you use to access the service.

Lines 41 and 42 grab the local port that stunnel4 needs in order to work.


# local stunnel4 port
STUNNEL_PORT=`/bin/grep 'accept' /etc/stunnel/ibackup.conf | awk '{print $3}'`

Next, between lines 44 and 49, we make make sure there’s a directory where we can dump out all of our data.

# place to dump everything
DUMPDIR="$HOME/.ibackup_data"

if [ ! -d $DUMPDIR ]; then
    mkdir $DUMPDIR
fi

We’re eventually going to set this up to run daily, and if we’re doing a once-per-day backup, it’s nice to keep a few days’ worth around just in case. In lines 51 through 56 we use the very convenient *NIX date program to get date stamps for the current day, the previous day, and the day before that.

# today, for tagging the dumpfile
DATE=`/bin/date +%Y-%m-%d`

# yesterday, and the day before that
ONEDAYAGO=`/bin/date -d '1 day ago' +%Y-%m-%d`
TWODAYSAGO=`/bin/date -d '2 days ago' +%Y-%m-%d`

We’re almost ready to actually dump out our data now. Lines 58 to 86 are really boring and just make sure we have a destination name and the programs we need, so I’ll gloss over those. The important part picks between lines 88 and 92, where we go into our backup directory and actually run mysqldump.

cd $DUMPDIR

echo -n "dumping $DB_NAMES to $DUMPDIR/$IBACKUP_DEST-$DATE.mysql... "
$MYSQLDUMP --opt --databases $DB_NAMES > $IBACKUP_DEST-$DATE.mysql
echo "done!"

Let’s take a look at that dump command to make sure we understand it. The $DB_NAMES variable was set earler in the script. It’s the second command line argument passwed to the script, and it tells mysqldump which databases we care about. The $IBACKUP_DEST variable, similarly, was the first command line argument passed to the script, and it just sets the part of the name of our dump file. Finally, the $DATE variable we just saw getting set a moment ago, and we use it as a date stamp for this dump.

Next, between lines 94 and 105, we compress and encrypt our dump file. We also check to make sure the encryption step actually worked and we exit out if not.

echo -n "gzipping $DUMPDIR/$IBACKUP_DEST-$DATE.mysql... "
$GZIP $DUMPDIR/$IBACKUP_DEST-$DATE.mysql
echo "done!"

echo -n "encrypting $DUMPDIR/$IBACKUP_DEST-$DATE.mysql... "
$GPG -e -r $GPG_EMAIL $DUMPDIR/$IBACKUP_DEST-$DATE.mysql.gz
echo "done!"

if [ $? -ne 0 ]; then
    echo "encryption failed!"
    exit 7
fi

We do a little cleanup and start stunnel4 in lines 107 to 112, which are pretty self explanatory.

echo -n "removing $DUMPDIR/$IBACKUP_DEST-$DATE.mysql.gz... "
/bin/rm -fv $DUMPDIR/$IBACKUP_DEST-$DATE.mysql.gz
echo "done!"

echo "Starting stunnel for rsync..."
/etc/init.d/stunnel4 start

And now on line 114, we finally send our freshly made backup off to the remote servers managed by IBackup for safe keeping.

$RSYNC -r -v -z -t --delete-after --exclude $IBACKUP_DEST-$ONEDAYAGO* --exclude $IBACKUP_DEST-$TWODAYSAGO* $DUMPDIR/ $IBACKUP_USERNAME@localhost::ibackup/$IBACKUP_DEST --port $STUNNEL_PORT

This is using several different features of rsync so let’s talk about them so we know what is going on. The command line flags -r -v -z -t are what the folks at IBackup tell us to use when syncing things over to them. The --delete-after flag tells rsync to delete anything in the target directory that’s not in the local directory being sync’d. But we also have the two --exclude options there.

So what we’re really saying is “Keep the file we’re sending over, and the one from yesterday, and the day before that, but then delete everything else”. That way we always have the last three days’ worth of backups. The last point of interest is that because we tell rsync to use $STUNNEL_PORT, all the communication is encapsulated by stunnel4 and is therefore encrypted across the wire as it gets transferred. Whew!

The last few lines from 116 to 121 simply stop stunnel4 and clean up things by removing the directory we made to put the dump file into.

echo "Stopping stunnel for rsync..."
/etc/init.d/stunnel4 stop

echo -n "Cleaning up... "
/bin/rm -rfv $DUMPDIR
echo "done!"

So that should about do it. To find out, put this script somewhere and run chmod 755 db-backup.sh on it so it’s executable. Then test it to make sure it actually works. If you want the target to be called ‘mybackup’, and you have to back up the databases named ‘db1′, ‘db2′, and ‘db3′, you would invoke the script like this.

/path/to/db-backup.sh mybackup "db1 db2 db3"

If that works, then you are almost done! We just need to automate the process, which we can do trivially with a cron job. In your shell where you tested the script from, type the following:

crontab -e

This will put you into edit mode for your crontab. Assuming you want the backup to happen starting at 12:40am each day, put the following into your crontab file:

0 40 * * * /path/to/db-backup.sh mybackup "db1 db2 db3" >/dev/null 2>&1

Save the file, and then type crontab -l to list the crontab and make sure that line is in there. If it is, then there’s just one last thing to do.

Testing a restore

I said at the start of this article that if you haven’t done a restore, you don’t have backups. And seriously, I meant that. So if you’ve made it this far, then you must do one last step. The day after you set this up, you need to go to your IBackup account, download your dump file, decrypt it with the GPG key you made, and do a restore to a running MySQL instance.

Make sure you do this last step, as the consequences of missing it can be dire.

In conculsion

I covered quite a bit in this article, but if you make it through everything, you should be able to rest easier knowing that you have good, safe backups of your database. Although we used the IBackup service in these examples, very little of what I covered is specific to them.

If you wanted to use a different company, or maybe just push the backup files to a different server you have, you’ll probably just need to change some of the commands that you pass to rsync. The basic strategy though should’t have to change.

I hope you’re found this helpful, and don’t hesitate to get in touch if you have any questions.

24
Future of Web Design London May 17-19 2010

24 Comments

Have your say:

Sign Up to our Newsletter

Enter your e-mail address below to receive regular updates on web design, web development and web business. Subscribe today and receive a free 44 page PDF "Designing Web User Interfaces" by Ryan Singer of 37signals.

Subscribe to the Think Vitamin articles RSS feed

Future of Web Apps Dublin May 14 2010

News

Twitter

Follow us on Twitter

Subscribe

Article Subscribers

Feedburner blog subscriber indicator

News Subscribers

Feedburner blog subscriber indicator

Subscribe by Email

You can receive Think Vitamin updates via email. Just pop your email address in the box below and click the arrows.

Subscribe by RSS

You can also receive new Think Vitamin posts via your RSS feed reader

Subscribe RSS Think Vitamin is a proud member of the Smashing Network

Ads Via The Deck