Business,  Consulting,  Network Infrastructure,  sites,  Ubuntu,  Unix

Amazon S3 from the Command Line

In a previous article we looked at how to upload Final Cut Server content to s3 using Jungle Disk.  We also looked at how to upload ec2 images to s3. But now we’re going to take a deeper dive into using s3 from the command line.

There are a number of tools that have been developed by the community to leverage Amazon’s S3 Storage Service.  This isn’t as cut and dry as using ec2 due to the fact that the Amazon tools are written in JavaC#PerlPHPRuby and Python.  However, I’m a shell kinda’ guy to a large degree and I was able to find a couple of places where people have written shell wrappers for the tools made available by Amazon. The first is called s3cmd, which we will cover in this article and the second is called Another S3 Bash Interface, which we will cover in a future article.  The following assumes that you have already setup an account with Amazon Web Services (AWS).  As for terms, bucket is the big one: think of a bucket like a logical partition of your s3 account, similar to a volume (I say as I duck so that when someone throws something at me for oversimplifying they hopefully miss).

s3cmd is pretty straight forward to use. Download it from the developer’s site (linked to from above) and then run through the setup wizard using the following:
s3cmd –configure

For most future commands, you’ll notice that there will be a path that shows s3:// as part of the URL/URI; this is because s3cmd uses the s3:// to reference the root of your s3 account.  Now that you’ve configured s3cmd let’s go ahead and make a new bucket called server01_backup, which can be done using the mb verb:

s3cmd mb s3://server01_backup

Now that we have a bucket, use the ls verb of the s3cmd to list your available buckets:

s3cmd ls

Once you’ve verified that you have a bucket, let’s look at the contents of the bucket:

s3cmd ls s3://server01_backup

Now let’s copy a file into a bucket, which we’ll do using the put verb followed by the source and then the path:

s3cmd put /Volumes/Backup/backup.tar s3://server01_backup/backup.tar

Now let’s go ahead and create another bucket for web images called images:

servercmd mb s3://images

Once we have a bucket for our web images, let’s upload a file to it:

s3cmd put –acl-public –guess-mime-type ~/Desktop/emerald.png s3://images/emerald.png

Because we used the –acl-public flag with the put verb the file is now available by anyone in read only form.  The output of the command will have a line that indicates the URL that can then be used to access the file later, as can be seen here:

Public URL of the object is: http://images.s3.amazonaws.com/emerald.png

You can now call on the file as you would any other file, using the path.  If you saw the previous article on leveraging s3 with Final Cut Server then you’ll likely be interested in the fact that this can be automated through the scripts option of Final Cut Server.

Next, let’s say that you wanted to copy a nested directories contents into your images bucket, you’d use the the put verb with the –recursive option, then list the source and finally the target:

s3cmd put –recursive images s3://images/

Let’s say that we’re now done with our images bucket and ready to delete it.  This can be done using the rb verb with s3cmd, which standards for “remove bucket.”

s3cmd rb s3://images

s3cmd also has an option to synchronize, making it a pretty darn nice offsite replication solution.  Let’s say you’re keeping a website on your local drive and want to sync it to Amazon nightly, just add the following to a cron job:

s3cmd sync  ./  s3://DocumentRoot/

If you’re going to use sync with a fairly well built out directory structure, also consider using the –dry-run flag for a little sanity checking before you go synchronizing a lot of data.  You can also use the -include and -exclude flags to limit what s3cmd will synchronize.