Tag Archives: Storage

iPhone

iPad + Box.net = Win

Box.net is a cloud-based file sharing service that I used extensively in my last book. Similar to dropbox.com, Box.net allowed my publishers and I to automate our workflow with regard to the publishing process, but more importantly, I was actually able to do much of the review and exchange of files from the iPad, which was really nice given that the book was on iOS. I’ve been working with a few companies over the past few weeks on coming up with various strategies for cloud interoperability, and Box.net has come up a few times in this regard. Looks like I’m not the only one!

Business

EMC + Isilon = ?

EMC is buying Isilon for $2.25 billion. They want the video market, which seems to just be growing and growing. EMC stock dipped a little on the news, which is not surprising because Isilon isn’t worth what EMC is paying for it. What does this mean for the video market? More uncertainty. EMC has been an acquisition marathon runner since 2002, buying up Avamar, Documentum, Epoch, McData, Iomega, Archer, Greenplum, Bus-Tech, Kashya, Dantz, Mozy, Data Domain and even VMware (not to mention a bunch of other companies).

So what does this mean for Isilon’s product line moving forward. If you look at how the acquisition of Dantz and Iomega sparked the Insignia line at EMC and how profits from those lines jumped well over 60% it isn’t hard to think that Isilon will almost instantly become more profitable. Of course, the Mac Retrospect software was sold off and now has an uncertain future… One would like to think that the combination of EMC’s wide variety of technologies and Isilon will result in even more environments that Isilon can play and even more technical advances to the product line. But I guess we’ll see what happens there…

Xsan

Xsan TCO

I recently read an article in CIO magazine about the cost per gig per month. In the article they quoted Google at about 6 cents per gig per month.  I use Amazon for a few projects, which runs at about 12 cents per gig per month.   Including labor and hardware I decided to look at about what it would cost per gigabyte per month for Xsan storage.  Averaging out 30 installs that we did over the past year turned out a total of about 7.2 cents per gig per month, as opposed to around $2.00 per gig per month which is pretty average for many SAN solutions.  Now, Xsan does have its drawbacks compared to a lot of other truly enterprise-class storage solutions (no snapshots, no LUN redundancy, etc), but provided you build it properly, use it for the purposes that it is actually intended and therefore keep labor costs down over a 3 year cycle you can get similar TCO numbers to what you might end up paying for other solutions.  

Having said this, the larger Xsans typically require more infrastructure and features, which can lead to around double the cost per month per gig.  For example, introducing Cloverleaf or Vmirror into the equation will typically require us to double up storage costs and require bigger and better switches. 

I will not say that a cloud storage service such as Google or Amazon doesn’t have its place.  It absolutely does: offline storage, web storage, if you have an existing Xsan and need to archive but can’t spring for the tape drive, Final Cut Server archival (see my previous post on using that) if you travel a lot (like me), etc.  But before you jump on the Storage as a Service bandwagon run the numbers very carefully.  If it makes sense on a per-use basis then absolutely go for it, but try and factor everything in the process (especially the data access speed over your WAN pipe and additional load that will be placed on said pipe).

Windows Server

iSCSI Target Creation

The iSCSI Initiator that we use for connecting Windows to iSCSI targets has a friend.  It’s called Microsoft Windows Storage Server, which you can use to turn a DAS RAID in a Windows box into a LUN for iSCSI.  Good stuff.  Check out the data sheet here:

download.microsoft.com/download/d/8/4/ d84b1c50-e0bb-45ba-b2f4-356f4f456a88/WUDSS%20Datasheet_Final.doc

Now that’s not to say they’re the only game in town.  iSCSI Target is also a feature of OpenSolaris:

http://opensolaris.org/os/project/iscsitgt/

And there’s a nifty little Open Source Project called iSCSI Enterprise Target:

http://sourceforge.net/projects/iscsitarget/?abmode=1

Mac OS X

Solid State Storage for the Masses

I originally posted this at http://www.318.com/TechJournal

The new MacBook Air was introduced at MacWorld with the option for a 64GB Solid-State hard drive. Toshiba is also now offering Solid-State drives in sizes that are 32GB, 64GB and 128GB. The drives still seem to be lagging in adoption due to high costs, but they offer more durability, faster boot times and lower power requirements which should all lead to higher adoption over the next two years.

Toshiba will also begin making Solid-state SATA drives in May that can be used in desktop systems.

Mac OS X Windows XP

Using Trash for Storage

I’m not sure why this keeps coming up, but you don’t want to use your trash (whether for Entourage, Outlook, Mac OS X or the Recycle Bin in Windows) as a place to store files, emails or anything else you’d be bummed out about loosing.  Keep in mind that trash can be taken away at any given moment…

Xsan

Primordial Storage

Primordial storage refers to unallocated storage capacity on a storage device. Storage capacity can be allocated from primordial pools to create storage pools. This means that primordial pools are disk/device sources for allocation of storage pools.  In Xsan primordial pools aren’t used but there is often unused capacity in the form of LUNs that are referred to as primordial at time.  Especially on a Promise RAID where you might have certain LUNs that are smaller than the potential size of others and therefore might end up with disks left over which can be mapped and used as near-line storage later.  This term, primordial, can be used to refer to those.

Articles and Books Xsan

Article on EMC Channel Manager Retiring

Another article on EMC I was quoted in:

http://www.crn.com/storage/197006487?pgno=2

Xsan

Practical ILM

I originally posted this at http://www.318.com/TechJournal

The amount of data used by Small Businesses is on target to rise 30% to 35% in 2006. Sarbanes-Oxley, HIPPA and SEC Rule 17a-4 have introduced new regulations on the length of time data must be kept and in what format. Not only must data be kept, it must be backed up and secured. These factors have the cost of data storage for the Small Business increasing exponentially.

Corporations valued at more than 75 million dollars are generating 1.6 billion gigabytes of data per year. Small and medium sized companies can reap the benefits of developments being made with larger corporations. Different methods and classifications for data are one of these.

Information Lifecycle Management (ILM) is a process for maximizing information availability and data protection while minimizing cost. It is a strategy for aligning your IT infrastructure with the needs of your business based on the value of data. Administrators must analyze the trade-offs between cost and availability of data in tiers by differentiating production or transactional data from reference or fixed content data.

ILM includes the policies, practices, services and tools used to align business practices with the most appropriate and cost-effective data structures. Once data has been classified into tiers then storage methods can be chosen that are in line with the business needs of each organization. The policies to govern these practices need to be clearly documented in order to keep everyone working towards the same goals.

Storage Classification

Online storage is highly available with fast and redundant drives. The XRAID and XSAN are considered online storage, which is best used for production data as it is dynamic in nature. This can include current projects and financial data. This data must be backed up often and be rapidly restored in the event of a loss. It is not uncommon to use an XRAID to backup another XRAID for immediate restoration of files and a Tape Library to maintain offsite backups of the XRAID.

Offline storage is used for data retained for long periods of time and rarely accessed. Data often found on offline media includes old projects and archived email. Media used for offline storage is often the same as media used for backup such as tape drives and Optical media. When referring to offline storage we refer to archives, not backups. Archives are typically static whereas backups are typically dynamically changed with each backup. Offline storage still needs to be redundant or backed up, but the schedules for backup are often more lax than with that of other classifications of storage. In a Small or Medium Sized company offline media is often backed up, or duplicated, to the same type of media that it is housed on. There may be two copies of a tape (one onsite and one offsite) or two copies of DVD’s that the data has been burned onto, with each copy stored in a different physical location.

Near-line storage bridges the gap between online and offline storage by providing faster data access than archival storage at a lower cost than primary storage. Firewire Drives are often considered near-line storage because they are slower and usually not redundant. Near-line can refer to recent projects, old financial data, office forms that are updated rarely and backups of online storage to be made readily available for rapid recovery. Backup of Near-line storage will probably be to tape.

Data Classification

Mission Critical data is typically stored in online storage. This data is the day-to-day production data that drives information-based businesses. This includes the jobs being worked on by designers, the video being edited for commercials and movies, accounting data, legal data (for law firms) and current items within an organizations groupware system.

For the small business, Vital and Sensitive data are often one and the same. Vital data is data that is used in normal business practices but can be down for minutes or longer. Sensitive data is often accounting data that a company can live without for a short period of time, but will need to be restored in the event of a loss in a short amount of time. Small business will typically keep Vital and Sensitive data on the same type of media but may have different backup policies for it. For example, a company may choose to encrypt sensitive data and not vital data.

Non-Critical data includes items such as digital records and personal data files of network users. Non-Critical data could also include a duplicate of Mission Critical data from online storage. Non-Critical data often resides on near-line or off-line media (as is the case with Email archives). Non-critical data primarily refers to data kept as part of a companies risk management strategy or for regulatory compliance. This includes old emails and financial records and others.

Classification Methods

The chronological method for classifying data is often one of the easiest and most logical. For example, a design firm may keep their mission critical current jobs on an Xraid, vital jobs less than three months old on a Firewire drive attached to a server and non-critical jobs older than three months on backup tapes or offline Firewire drives. It would not be possible to implement this classification without having the data organized into jobs first. Another way to look at this method is that data over 180 days old automatically gets archived.

This characteristic method of data organization means that data with certain characteristics can be archived. This can applied to accounting and legal firms. Whether a client is active or not simply represents a characteristic. If a type of clothing is in style or not represents another possible characteristic. Provided that data is arranged or labeled by characteristic, it is possible to archive using a certain characteristic as a variable or metadata. Many small and medium sized companies are not using metadata for files yet, so a good substitution can be using a file name to denote attributes of the files data.

The hierarchical method of data organization means that files or folders within certain areas of the file system can be archived. For example, if a company decides to close down their Music Supervision department then the data stored in the Music Supervision share point on the server could be archived.

Service Level Agreements

The final piece of the ILM puzzle is building a Service Level Agreement for data management within a company. This is where the people that use each type of data within an organization sit down with IT and define how readily available that data needs to be and how often that data needs to be backed up.

In a Small Business it is often the owners of companies that make this decision. In many ways, this makes coming to terms with a Service Level Agreement easier than in a larger organization. The owner of a small business is more likely to have a picture of what the data can cost the company. When given the cost difference between online and near-line storage, small business owners are more likely to make concessions easier than managers of larger organizations who do not have as much of an ownership mentality towards a company.

Building a good Service Level Agreement means answering questions about the data, asked per classification. Some of the most important questions are:

How much data is there?How readily available does the data need to be?How much does this cost the company, including backups? Given the type of storage used to house this data, how much is it costing the company? If nearly half the data can be moved to near-line storage what will the savings be to the company? In the event of a loss, how far back in time is the company willing to go for retrieval? Is the data required it to be in an inalterable format for regulatory purposes? How fast must data be restored in the event of a loss? How fast must data be restored in the event of a catastrophe? Will client systems be backed up? If so, what on each client system will be backed up?

Information Lifecycle Management

Most companies will use a combination of methods to determine their data classification. Each classification should be mapped to a type of storage by building a SLA. Once this is done software programs such as BRU or Retrospect can be configured for automated archival and backups. The backup/archival software chosen will be the component that implements the SLA, so should fill the requirement of the ILM policies put into place.

The schedules for archival and backups should be set in accordance with the businesses needs. Some companies may choose to keep the same data in online storage for longer than other companies in the same business because they have invested more in online storage or because they reference the data often for other projects. The business logic of the organization will drive the schedule using the SLA as a roadmap.

Setting schedules means having documentation for what lives where and for how long. Information Lifecycle Management means bringing the actual data locations inline with where the data needs to be. Once this has been done, the cost to house and back up data becomes more quantifiable and cost efficient. The SLA is meant to be a guideline and should be revisited at roadblocks and intervals along the way. Checks and balances should be put into place to ensure that the actual data management situation accurately reflects the SLA.

ILM and regulatory compliance are more about people and business process than about required technology changes. The lifecycle of data is important to understand. As storage requirements spiral out of control, administrators of small and medium sized organizations can look to the methods of Enterprise networking for handling storage requirements with scalability and flexibility.

Xsan

The Anatomy of a RAID

Originally posted at http://www.318.com/TechJournal

The acronym for RAID can often be misleading as it has had multiple meanings over the years. RAID originally stood for a redundant array of inexpensive disks. The acronym RAID is now also known as a redundant array of independent disks as not all RAID disks are inexpensive. RAID refers to a hard drive storage mechanism using multiple hard drives to share or replicate data among the drives. In some cases this can mean having data that is written to a single logical drive stored on multiple drives so there is redundancy of the data or RAID can be used to maximize throughput to drives by aggregating possible speeds of RAID member drives. A key advantage of RAID is the ability to combine drives into an array with more capacity, reliability, speed, or a combination of these, than was affordably available in a single device.

Through the remainder of this article we will be looking at different types of RAID and what each can do. But before we look into RAIDs let’s look at a JBOD. In a JBOD (Just a Bunch of Disks) which is also often called a concatanated RAID in OS X, you can use multiple drives to merge data into one volume. You can take 2 drives of 2 Terabytes each and 4 drives of 1 Terabyte and merge them into one volume of 8 Terabytes. In this scenario you would end up with no fault tolerance in your environment but you would be able to take use of low cost drives, such as LaCies to create a single volume. However, if any of the drives in a JBOD fail the full volume will fail. This leads us to use this type of situation primarily with volatile situations such as a disk-to-disk backup solution.

A RAID0 is similar to a JBOD; however RAID0 requires all drives in a RAID0 array to be identical in size. Provided the drives are the same in size, RAID0 offers the fastest speeds available in a RAID. These are often used for high definition video editing and volumes housing database volumes requiring a lot of speed. RAID0 does not offer any redundancy of data. If one member in the array fails, just as with a JBOD then the volume will fail as well. However, RAID0 is fast and an inexpensive way to get large amounts of fast storage.

RAID1 is often known as a mirror. In RAID1, all data written to one disk is then duplicated onto a second disk of an identical size. In a mirror, if one member of the set of disks were to fail then the disk would continue to be accessible for read/write operations. RAID1 offers amongst the best protection to data loss available to RAID scenarios, but at the highest cost. For every byte of data stored on a RAID1 volume there must be an equal byte used for redundancy. As high end disks have become more and more expensive the development of more complex RAID strategies helps to maximize our ability to make use of a variety of solutions.

In RAID 3 you would end up with one member of each array as a static parity drive. This drive will store a stripe of information about each other drive and if one of them crashes will create itself in that drives image. The parity also causes a slight loss of speed over if it was a large RAID0 volume. RAID0 in the truest sense of the word (no parity) would net you 100% of the usable space.

Parity information is stored striped across all of the drives in RAID5, not just one. In RAID3, parity information is stored on a dedicated parity drive. But even in RAID3 you shouldn’t be able to make the smallest drive the hot swap. In fact you can only typically build a RAID0 out of drives of different sizes (which isn’t much of a RAID but more of a JBOD) unless you slim all drives down to the smallest drive size manually. Thus, a RAID5 + Hot Spare array of a 5 40GB drives would end up being a RAID 5 volume of 4 40GB drives. If you pull one for hot spare you would end up with an 80GB volume. This nets a 33% loss of space. If all 4 drives were in the RAID then you would get a RAID5 volume of 120GB, netting a 25% loss. If 5 drives at 40GB were in the RAID you would end up with a 160GB volume; thus resulting in only a 20% loss. And so on. The parity information is stored on all drives so any single drive can go down and the contents of the RAID will be rebuilt based on the parity stored on each of the drives.

RAID6 offers even more redundancy by writing two stripes of parity information to each member of the array. This allows for two drives in the RAID to crash without loosing data. RAID6 comes with more cost than most other RAIDs, both in RAID hardware and hard drives, and so is used much more rarely.