off-site backup for $0.10/GB using dirvish and Amazon EC2 and EBS

I’ve been using dirvish, an rsync-based snapshotting backup system, for years to manage local and off-site backups. It’s simple to set up, automatic, creates daily snapshots of entire systems (or just specific directories), and it’s a breeze to browse and restore–all the files are right there in a tree, organized by date. Think of it like Apple’s time machine, but better because you can actually make it do what you want.

I recently needed to set up off-site backup for a few hundred gigabytes of data. My first thought was S3, but the HTTP interface meant that I couldn’t use a simple tool like rsync (or dirvish) to automate the snapshotting, and that browsing and restoring entire filesystems from backup would be cumbersome. Then I remembered that Amazon recently announced support for booting EC2 instances from persistent EBS volumes. This lets you “save” an instance by shutting it down and starting it up again, and you only pay for compute hours when the computer is running. Storage on EBS volumes is cheaper than on S3 ($0.10/GB instead of $0.15). Also, EBS volumes are just normal block devices that can be mounted by EC2 instances as though they were hard drives.

So here’s the idea: create an EC2 instance that boots from a big, dedicated EBS volume. Every night (or week, or whatever), start up that instance, run dirvish for the off-site backup, and then shut it down again. I only pay for the instance during the short periods it runs to perform the backup, and my data is saved offsite on the durable EBS volume. I implemented this system and it has been working great for several weeks. I just launch a python script (as a cron job) that starts the instance, runs dirvish, and then shuts it down when it’s complete. For those interested, here’s the (quick, dirty) python source (which uses the excellent boto library for manipulating the EC2 instance):

#!/usr/bin/env python -t
# encoding: utf-8
"""
run_offsite_backups.py

Wake up the EC2 backup server, run dirvish backup, then shut it down

Created by Bryan Klingner (code.b@overt.org) on 2010-02-02.
Feel free to use this code yourself. Maybe email me if you do :)
"""

import sys
import os
import boto
import time
import subprocess

BACKUP_INSTANCE_ID = 'YOUR_INSTANCE_ID'

def main():

    conn = boto.connect_ec2()

    # get the backup instance object
    instance = conn.get_all_instances(instance_ids=(BACKUP_INSTANCE_ID,))[0].instances[0]

    # if the instance is stopped, start it up
    if instance.state != 'running':
        conn.start_instances(instance_ids=(BACKUP_INSTANCE_ID,))
        waited = 0
        while instance.state != 'running':
            instance.update()
            sys.stdout.write("rInstance starting up (%d sec)..." % (waited))
            sys.stdout.flush()
            time.sleep(1)
            waited += 1

    print "n"
    print "Backup instance running:"
    print "    ID:       ", instance.id
    print "    State:    ", instance.state
    print "    DNS name: ", instance.dns_name

    # chill for a few seconds so the SSH server is listening
    time.sleep(10)

    print ""
    print "Initiating backup..."
    retcode = ssh_cmd('dirvish-expire; dirvish-runall', instance.dns_name, user='username')
    print ""

    # backup is done; shut down the instance
    conn.stop_instances(instance_ids=(BACKUP_INSTANCE_ID,))
    waited = 0
    while instance.state != 'stopped':
        instance.update()
        sys.stdout.write("rInstance shutting down (%d sec)..." % (waited))
        sys.stdout.flush()
        time.sleep(1)
        waited += 1
    print ""

def ssh_cmd(cmd, host, user='root'):
    """ Run a shell command on a remote server via ssh """

    ssh_cmd = 'ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no ' + user + '@' + host + " '%s'" % (cmd)
    print "Running SSH command: %s" % ssh_cmd
    returncode = subprocess.call(ssh_cmd, shell=True)

    #logging.debug( output, returncode )
    return returncode

if __name__ == '__main__':
    main()