Hosting Web Sites in Amazon Web Services

Amazon S3 now allows administrators to host simple web sites. Previously, you could host images, videos and other files using S3 buckets, but now you can host full sites. To do so you will need only configure a webroot and some error documents. To get started:
  1. Log into the Amazon S3 Management Console
  2. Right-click on an Amazon S3 bucket
  3. Open the Properties panel
  4. Configure your webroot
  5. Click on the Website tab
  6. Configure error documents in the Website tab
  7. Click Save
Pretty easy, right? But what if you need to configure the php.ini file or add MIME types, etc. Notice that at the start of this I said “simple.” I’m sure more features are to follow, but for now S3 is mostly appropriate for very simplistic sites.

Amazon S3 File Size Limits

Back in November of 2008 I did an article awhile back on a way to use Amazon’s S3 to hook into Final Cut Server. At the time though, S3 had a pretty big limitation in that it wasn’t really suitable for backing up large video files as an archive device for Final Cut Server. But today, Amazon announced that S3 now supports files of up to 5 terabytes using multipart upload (previously the maximum file size was 5 gigabytes). This finally means that files do not have to be broken up at the file system layer in order to back up to Amazon’s cloud. However, this does not mean that traditional disk-to-disk backup solutions can be leveraged to provide a good target for backups and archives as backups need to be performed using the multipart upload. The ability to now use S3 for large files allows us to finally use Amazon S3 in a way that is much simpler that it was to do so previously, although it is still not as transparent as using a file system or URI path. Overall, this represents a great step for Amazon and I hope to see even more of this in the future!

Programmatically Interacting with Google Apps

There are a number of ways that you can interact with Google Apps: there is the website, the new Google Cloud Connect and an API that allows you to integrate Google Apps with your own solutions. The API is available for python and java and can take some time to get used to, even though Google has done a good job with making it pretty straight forward (comparably). Therefore, there are a couple of tools that ease the learning curve a bit.

GoogleCL on Ubuntu

The first, and easiest is GoogleCL. GoogleCL is a command line version of Google Apps that will allow you to interact with YouTube, Picasa, Blogger and of course Google Docs. To use GoogleCL you’re going to need python-gdata. If you’re using Ubuntu, you would do an apt-get and install python-gdata:
apt-get install python-gdata
Once installed, you’ll want to then download the deb package from Google Code:
Once downloaded, install it using dpkg with the -i option (assuming you’re still using the same working directory:
dpkg -i googlecl_0.9.11-1_all.deb

GoogleCL on Mac OS X

GoogleCL is also available for the Mac. First, download the gdata-python-client from and then extract the file (ie – unzip gdata-2.0.13). Next, install it using Python (2.0.13 is the latest version) with your working directory set to the previously extracted folder:
python install
Next up, let’s grab GoogleCL from the GoogleCL Google Code page: wget Then hop into the newly extracted directory and run the python installer: python install

Using GoogleCL on Mac and Linux

Once GoogleCL has been installed, the use is the same between Mac OS X and Linux. Simply use the newly acquired google command (this is actually a Python front-end to the API at /usr/bin/google) followed by a service and then a verb. Verbs are based on services (not all services offer the same features and therefore do not have the same verbs). A list of services with their verbs includes the following. docs – Allows for interaction with Google Docs, with verbs that include the following:
  • edit – Allows you to indicate an application to use as an editor for the given document (ie – vi).
  • delete – Delete a document on Google Docs.
  • list – List documents on Google Docs.
  • upload – Uploads the specified document (options include title, folder and format of the document being uploaded).
  • get – Downloads the specified document in the format specified using the format option.
blogger – Manage content stored using the blogger service.
  • post – Allows you to post content (which is then known as blog).
  • tag – Requires a title (for blog entries) and the tags that you would like to use with the post in question.
  • list – Shows posts (can use blog entry, title and owner as a delimiter, useful when used w/ grep to constrain output).
  • delete – Removes a post specified.
picasa – Allows you to interact with the picasa service for posting and obtaining images used with Google Apps.
  • get – Download specified albums.
  • create – Create an album.
  • list – List images.
  • list-albums – List albums.
  • tag – Tag images
  • post – Add a photo to an album.
  • delete – Delete a photo or an album.
contacts – Manage contacts (given the lack of an edit option, use an add and then a delete to impart an edit).
  • list – Show contacts (can specify fields to constrain output).
  • list-groups – Show the groups for a user.
  • add – Add a contact.
  • add-groups – Create a group of contacts.
  • delete-groups – Remove a group of contacts
  • delete – Remove a single contact
calendar – Manage calendars.
  • add – Create a calendar entry
  • list – Show all events on a given calendar.
  • today – Show calendar events over the next 24 hour period.
  • delete – Remove calendar events.

Beyond GoogleCL

Let’s put this into perspective. Let’s say I have an application, and that application can run a simple shell command. Then, let’s say I create a calendar event in that application. The application could send a command to the shell with a variable. If I had calendar information to create such as “Meeting with KK tomorrow at 9am” then I could send a command as follows:
google calendar add “Meeting with KK tomorrow at 9am”
This would cause the event to appear on my calendar and sync to any devices that were then configured to work with my calendar. But, if I were to issue this command on the server-side then it would attempt to create all events for the same users, which is likely not very helpful for most organizations that have more than one calendar and/or user. As mentioned /usr/sbin/google is a python script. It makes use of python-gdata and provides a more direct access to the Google Apps API. As such, it allows for far more complex logic than the GoogleCL front-end does. The google script does give savvy developers a look at how Google intends for many of their methods to be used and even allows you to borrow a line or two of code here and there. Simple logic can be parlayed into code quickly using GoogleCL, but you will quickly outgrow what can be done with GoogleCL and move into using the API more directly if you have any projects of substance!

Using the CrashPlan Pro REST API

CrashPlan Pro Server is a pretty cool tool with a lot of great features that can be used to back up client computers. There are a lot of things that CrashPlan Pro is good at out of the box, but there are also a lot of other things that CrashPlan Pro wasn’t intended for that it could be good at, given a little additional flexibility. The REST API that CrashPlan Pro uses provides a little flexibility and as with most APIs I would expect it to provide even more as time goes on. I often hear people run away screaming when REST comes up, thinking they’re going to have to learn some pretty complex scripting. And while the scripting can be complex, it doesn’t necessarily have to be. You can find a lot of handy information about the options available in the REST API at The very first example command that CrashPlan gives is the following:
Now, to use this in a very simple script, let’s look at it with curl. You are going to need to authenticate, so we’re going to inject that into the URL in much the same was that we would with something like, let’s say, WebDAV, SSH or FTP. If the server name were foundation.lan, the user name was daneel and the password was seldonrulez then the curl command would actually look like so (you could use the -u operator to inject the authentication information, but as you’ll see later I’d like to make those a bit less complex):
curl http://daneel:seldonrulez@foundation.lan:4280/rest/users?status=Active
Note: The default port for the web administration in CrashPlan Pro is 4280. This is simply going to output a list of Active users on the server. The reason it’s going to output only Active users is that we asked it to (reading from left to right after the rest is shown in the URL) query users, using the status attribute and specifying only to show us users whose status matches as Active. We could just as easily have requested all users by using the following (which just removes ?status=Active):
curl http://daneel:seldonrulez@foundation.lan:4280/rest/users
Each user has a unique attribute in their id. These are assigned in an ascending order, so we could also query for the user with an ID of 3 by simply following the users with their unique ID:
curl http://daneel:seldonrulez@foundation.lan:4280/rest/users/3
We could also query for all users with a given attribute, such as orgId (note that these attributes are case sensitive unlike many other things that start with http). For example, to find users with an orgID of 3:
curl http://daneel:seldonrulez@foundation.lan:4280/rest/users?orgId=3
The API doesn’t just expose looking at users though. You can look at Organizations (aka orgs), targets (aka mountPoints), server statistics (aka serverStats) and Computers (aka computers). These can be discovered by running the following command:
curl -i http://daneel:seldonrulez@foundation.lan:4280/rest/
To then see each Organization:
curl http://daneel:seldonrulez@foundation.lan:4280/rest/orgs
And to see each Computer:
curl http://daneel:seldonrulez@foundation.lan:4280/rest/computers
You can also perform compound searches fairly easily. For example, let’s say that we wanted to see
curl http://daneel:seldonrulez@foundation.lan:4280/rest/computers?userId=3&status=Active
These basic incantations of curl are simply getting information, which programmatically could also be specified using a -X operator (or –request if you like to type a lot) to indicate the type of REQUEST we’re sending (continuing on with our Code42 Sci-fi inspired example):
curl -X GET -H ‘Content-type: application/json’ http://daneel:seldonrulez@foundation.lan:4280/rest/orgs
The important thing about being able to indicate the type of REQUEST is that we can do more than GET: we can also POST and PUT. We also used the -H operator to indicate the type of data, which we’re specifying as application/json (per the output of a curl -i command against the server’s REST API URL). POST is used to create objects in the database whereas PUT is used update objects in the database. This could result in:
curl -i -H ‘Content-Type: application/json’ -d ‘{“username”: “charlesedge”, “password”: “test”, “firstName”: “Charles”, “lastName”: “Edge”, “orgId”: “3”}’ http://daneel:seldonrulez@foundation.lan:4280/rest/users
Once you are able to write data, you will then be able to script mass events, such as create new users based on a dscl loop using groups,┬áremove users at the end of a school year (PUT {“status”: “Deactivated”}), mass change orgIds based on other variables and basically fully integrate CrashPlan Pro into the middleware that your environment might already employ.
Perl, Python, Ruby and PHP come with a number of options specifically designed for working with REST, which makes more complicated scripting much easier (such as with php’s curl_setopt); however, these are mostly useful if you already know those languages and the point of this article was to stay in shell scripting land. This allows you knock out simple tasks quickly, even if the good people at Code 42 didn’t think to add the specific features to their software that you might have in mind. Once you start to get into scripting more complex events, look to the Python examples at the bottom of the API Architecture page to get ya’ kickstarted!

Gmail + IE6

Got an email today informing me that Gmail will be dropping support for Internet Explorer 6. Nice of them to let us know rather than randomly killing support for it ’cause it’s old as crap like most vendors do. A win for Google there I’d say. Point of this article being, if you use IE 6 just stop. And if you’re an enterprise admin who doesn’t think you can pull off a massive IE 6 upgrade, this is Google’s way of having an intervention for ya’… MSI installer + a GPO = happier users anyway (be that MSI a newer IE 6, Chrome, Firefox or Safari). PS – Over 5% of visitors to this site come here on IE 6 on weekdays, a number cut in half on weekends.