Hosting Web Sites in Amazon Web Services

Amazon S3 now allows administrators to host simple web sites. Previously, you could host images, videos and other files using S3 buckets, but now you can host full sites. To do so you will need only configure a webroot and some error documents. To get started:
  1. Log into the Amazon S3 Management Console
  2. Right-click on an Amazon S3 bucket
  3. Open the Properties panel
  4. Configure your webroot
  5. Click on the Website tab
  6. Configure error documents in the Website tab
  7. Click Save
Pretty easy, right? But what if you need to configure the php.ini file or add MIME types, etc. Notice that at the start of this I said “simple.” I’m sure more features are to follow, but for now S3 is mostly appropriate for very simplistic sites.

NAS, Clouds & Backup

NAS (Network Attached Storage) devices are a popular alternative to providing centralized file services to smaller environments. This includes devices such as the Seagate BlackArmor, the DroboShare NAS and the Netgear ReadyNAS Pro. These are inexpensive as compared to an actual server, they require less management and they often come with some pretty compelling features. But one of the primary reasons to buy a NAS can end up being a potential pain point as well: they require less management than a server because they can’t do as much as a server can. For example, the option to replicate between two of them. Most have NAS to NAS replication built in. However, that replication ends up being dependent on having two of them. But what if you just have a machine on the other side of the replication, want to back it up remotely compressed or want to back up to a cloud environment. Well, if it’s not the same daemon then you’re typically stuck with CIFS, NFS, HTTPS (WebDAV) or FTP. The devices don’t typically give you the option to push directly from it nor to run a daemon that non-proprietary device can connect to directly, so you’d have to use a client to do the offsite sync. One example of how to do this would be to use JungleDisk and an Amazon S3 account. JungleDisk would mount the AmazonS3 storage and the NAS storage (all share points). You would then use a tool such as ChronoSync, Retrospect (Duplicate scripts not backup scripts btw) or even rsync to backup the device over CIFS. It’s not pretty, it’s extra latency and management, but it would work. The reason you would do synchronization is that if you attempt to backup (a la Retrospect Backup Scripts) then you’d send big, monolithic files over the wire. The smaller increments of data you can send over the wire the better. Another tool that can do that type of sync is File Replication Pro. That would actually do blocks instead of files, pushing an even smaller increment of data over the wire. There are certainly other services. You could even open up the firewall (for just the specific ports/IP addresses requiring connectivity, which is always a potential security risk) and have a remote backup service come in and pull the data sync over FTP, CIFS or WebDAV (if you want to stick with a cloud backup solution), but those types of services are a bit more difficult to find. The same is pretty much the same for cloud based storage. With the exception that instead of a built-in feature you’re either looking for a built-in feature or an API that allows you to develop your own. The moral of this story, if you use a NAS or a cloud-based solution and you want to back your data up, then your options are limited. Keep this in mind when you decide to purchase a NAS rather than, let’s say, a Mac OS X Server running on a Mac Mini with some Direct Attached Storage (DAS) connected to it.

Flow: Amazon S3, iDisk

Flow is a nice little FTP client. But it also supports WebDAV and SFTP as well as Amazon’s S3 and mounting an iDisk from a Mobile Me account. Unlike JungleDisk it doesn’t seem to mount S3 as an actual disk in Mac OS X, but it can be used to take files from iDisk to S3, which is fairly interesting. Flow also supports discovering all of the local services over Bonjour, which can be pretty helpful. Overall, it’s a nice little application that’s pretty sleek and I look forward to seeing where they go with it.

Looking at Amazon's Cloud

There is a lot of talk about “the cloud” in the IT trade magazines and in general at IT shops around the globe. I’ve used Amazon S3 in production for some web, offsite virtual tape libraries (just a mounted location on S3) and a few other storage uses. I’m not going to say I love it for every use I’ve seen it used for, but it can definitely get the job done when used properly. I’m also not going to say that I love the speeds of S3 compared to local storage, but that’s kindof a given now isn’t it… One of the more niche uses has been to integrate it into Apple’s Final Cut Server. In addition to S3 I’ve experimented with CloudFront for web services (which seems a little more like Akamai than S3) and done a little testing of MapReduce for some of log crunching – although the MapReduce testing has thus far been futile compared to just using EC2 it does provide an effective option if used properly. Overall, I like the way the Amazon Machine Instances (AMI – aka VM) work and I can’t complain about the command line environment they’ve built, which I have managed to script against fairly easily. The biggest con thus far (IMHO) about S3 and EC2 is that you can’t test them out for free (or at least not when I started testing them). I still get a bill for around 7 cents a month for some phantom storage I can’t track down on my personal S3 account, but it’s not enough for me to bother to call about… But if you’re looking at Amazon for storage, I’d just make sure you’re using the right service. If you’re looking at them for compute firepower then fire up a VM using EC2, read up on their CLI environment and enjoy. Beyond that, it’s just a matter of figuring out how to build out a scalable infrastructure using pretty much the same topology as if they were physical boxen. I think the reason I’m not seeing a lot of people jumping on EC2 is the pricing. It’s practically free to test, but I think it’s one of those things where a developer has a new app they want to take to market and EC2 gives us a way to do that, but then when the developer looks at potentially paying 4x the intro amount in peak times for processing power (if a VM is always on then you would be going from $72 to $288 per month per VM without factoring data transfer to/from the VM at .1 to .17/GB) they get worried and just go to whatever tried and true route they’ve always used to take it to market. Or they think that it’s just going to do everything for them and then are shocked about the fact that it’s just a VM and get turned off… With all of these services you have to be pretty careful with transfer rates, etc. I haven’t found a product to do this yet, but what I’d really like to have is use something like vSphere/vCenter or MS VMM that could provision, move and manage VMs, whether they sit on Amazon, a host OS in my garage or a bunch of ESX hosts in my office, or a customers office for that matter – and preferably with a cute sexy meter to tell me how much I owe for my virtual sandboxes.

Final Cut Server: Using Amazon S3 for Archival

Final Cut Server allows you to archive the primary representation (or the original file) for assets that are cataloged.  When you do so, the proxy clips (low resolution versions) of your assets still live on the Final Cut Server.  However, the primary representation, once moved to your archive device can then be archived off to another form of media. There are a variety of strategies to manage archived media. The one I will describe here is using the Amazon S3 storage service at a cost of approximately $.12 to $.15 per gigabyte. As a conduit to and from Amazon S3 we will use the Jungle Disk application, which uses the Amazon S3 API to provide a mount point to Mac OS X.  Before you get started, first create an Amazon account (or enable Amazon Web Services for your existing Amazon account).  Once you have enabled Web Services, click on the link that will be emailed to you that allows you to create an Access Identifier. Also keep in mind that file sizes cannot be larger than 5GB per file. To get started, download Jungle Disk from Once downloaded, run the installer. At the welcome screen click on Next.  At the Jungle Disk Account Information screen enter the Access Identifier and the Secret Key for your user account. Next, tell Jungle Disk to use the storage from Amazon as a Network Drive. Here, I gave this drive a name of FCSBackup. Next, create a new bucket (or use one you have already created). To create a new bucket, click on Next. At the Bucket Setup screen provide a name for your bucket of storage within S3.  I called my bucket fcsvrbackup.  Here you can use standard or high encryption. Speeds will be reduced with high encryption but the data will be more secure. Click Next when you are satisfied with your settings and then click on Finish to complete the installation. Next, for speed we’re going to do a little quick tuning.  Open the Jungle Disk Configuration application and then click on Network Drive for the fcsvrbackup bucket.  Then increase the maximum cache size and check the box for Upload files in the background for faster performance. Next, open /Volumes and verify that you see your fcsbackup (or whatever you decided to name the volume).  Alternately you can use the Bucket menu from within JungleDisk Monitor to click on Show Network Drive in Finder.  Once you have verified that your mount is there, test copying data to the folder to verify that you have full write access. Once you are finished, open the Final Cut Server System Preference pane. Then click on the plus icon (+) to bring up your Device Setup Assistant. Here, click on the Local Device type and click on Continue.   Next, open a Finder screen and open /Volumes/ (Command-Shift-G). Now drag the FCSBackup over to the location field in the Device Setup Assistant and provide a name for your Final Cut Server to refer to your Device as (I used Amazon Backup here). Now click Continue. Next, check the box for Enable as an Archive Device and click on the Continue button. At the next screen, click Finish. Now go to your trusty Final Cut Server client application and control click (or right click if you’re so inclined) on an asset. Here, you will click on the Archive item in the dialog box. Now, if you go to the FCSBackup volume you should see the file you decided to archive. These will be stored in a folder that corresponds to the device ID that Final Cut Server has for your “device”. Only the primary representation has been moved at this time, so your proxy media for these files is still in your proxy bundle. Now, click on the asset within the Final Cut Server client application and then perform a get info (Command I). You will now see the relative path to your device that the file is in. You can now unmount the FCSBackup drive and you will still be able to access the file. Once you have uploaded some files, tap into Amazon and check out how much they’ve charged you…