Configuring RSYNC for backups to AWS
Backups are important, we all understand this. Backups are also offered by most major services like Linode, which is a great failsafe. The problem that occurs when you have a need to use that backup to restore your system is that you are relying on Linode’s latest backup whenever that was (usually within 24 hours). For some a ~24 code restore is not an issue, but what if you wanted something a little more recent and stored offsite on a server of your choosing? This is where RSYNC and AWS come into play.
If you have been anywhere near a server or command line in linux then you probably have heard of RSYNC. While it may have the stigma of being something only uber geeks might use, it is very user-friendly and versatile. Let’s setup the scenario.
We have a custom build of code living in
/srv/portal on Server A. We want to backup this code on our AWS server (Server B) every hour. First we need to verify that Server A can run RSYNC and connect to Server B without requiring a password. Thankfully AWS requires that you setup a ssh key (.pem) file in order to connect. For our AWS session we created an Ubuntu server so our default user is
ubuntu. Connecting via ssh looks something like this:
ssh -i ~/.ssh/AWS_key.pem
So here is the breakdown:
ssh -i tells ssh that we want to point to a key file for our credentials. Our file is located in our home dir under
.ssh/. Next we give our username (ubuntu) and IP address of the server, pretty standard. If we connect successfully then we are good to go. Let’s move on to RSYNC!
Basic usage for RSYNC is:
rsync [OPTION...] [SRC] [DEST]
So, in our situation, we want to take everything in /srv/portal on Server A and put it in /srv/portal on Server B through ssh. Ok, deep breath, here we go.
rsync -avz -e "ssh -i /root/.ssh/AWS_key.pem" /srv/portal/ firstname.lastname@example.org:/home/srv/portal
-a = archive mode. This is actually a shortcut for -rlptgoD which means:
- recurse into directories
- copy symlinks as symlinks
- keep partially transferred files
- preserve modification times
- preserve group
- preserve owner
- preserve device files, preserve special files
I don’t know about you, but, I would much rather just type -a.
-v = increase verbosity
-z = compress file data during the transfer. This is great if you do not want to tarball everything first.
-e = specifies the remote shell to use, in our case we are telling RSYNC to use SSH. It is important to have the SSH command in quotes and also do not rely on shortcuts like “~/” for your home directory. The paths must be absolute.
Note the ending slash on /srv/portal/. This is important so that we are taking the files within the directory and not the directory as well. On Server B we place these files within the ubuntu user’s home directory, which is fine, but you could also create a symlink and place the files anywhere.
Test this on the command line and verify that everything is copying. Note that the first time you run this EVERY FILE will be copied over. We are using RSYNC because after the first time, it only copies over the DIFFs of the files that have changed. This is much more preferable than setting up an FTP script to copy all of the files every time.
Now that you have a working RSYNC command, it is time to put it into a cron job. On a majority of systems you will run
crontab -e to edit your cron jobs. A lot of people get weary when trying to setup a cron job, as long as you have google, cron is not a big deal.
Taken straight from debian-administration.org
The format of these files is fairly simple to understand. Each line is a collection of six fields separated by spaces.
The fields are:
- The number of minutes after the hour (0 to 59)
- The hour in military time (24 hour) format (0 to 23)
- The day of the month (1 to 31)
- The month (1 to 12)
- The day of the week(0 or 7 is Sun, or use name)
- The command to run
More graphically they would look like this:
* * * * * Command to be executed - - - - - | | | | | | | | | +----- Day of week (0-7) | | | +------- Month (1 - 12) | | +--------- Day of month (1 - 31) | +----------- Hour (0 - 23) +------------- Min (0 - 59)
To run a script every hour on the hour, it would look like this:
# Run the `something` command every hour on the hour 0 * * * * /sbin/something
so let consolidate our command into a script called
#!/bin/bash rsync -avz -e "ssh -i /root/.ssh/AWS_key.pem" /srv/portal/ email@example.com:/home/srv/portal
Place the script in
/usr/local/bin and then modify your cron job to point to it. Voila! Now every hour you will have an up-to-date version of
/srv/portal on your AWS server.