Xsan

Practical ILM

I originally posted this at http://www.318.com/TechJournal

The amount of data used by Small Businesses is on target to rise 30% to 35% in 2006. Sarbanes-Oxley, HIPPA and SEC Rule 17a-4 have introduced new regulations on the length of time data must be kept and in what format. Not only must data be kept, it must be backed up and secured. These factors have the cost of data storage for the Small Business increasing exponentially.

Corporations valued at more than 75 million dollars are generating 1.6 billion gigabytes of data per year. Small and medium sized companies can reap the benefits of developments being made with larger corporations. Different methods and classifications for data are one of these.

Information Lifecycle Management (ILM) is a process for maximizing information availability and data protection while minimizing cost. It is a strategy for aligning your IT infrastructure with the needs of your business based on the value of data. Administrators must analyze the trade-offs between cost and availability of data in tiers by differentiating production or transactional data from reference or fixed content data.

ILM includes the policies, practices, services and tools used to align business practices with the most appropriate and cost-effective data structures. Once data has been classified into tiers then storage methods can be chosen that are in line with the business needs of each organization. The policies to govern these practices need to be clearly documented in order to keep everyone working towards the same goals.

Storage Classification

Online storage is highly available with fast and redundant drives. The XRAID and XSAN are considered online storage, which is best used for production data as it is dynamic in nature. This can include current projects and financial data. This data must be backed up often and be rapidly restored in the event of a loss. It is not uncommon to use an XRAID to backup another XRAID for immediate restoration of files and a Tape Library to maintain offsite backups of the XRAID.

Offline storage is used for data retained for long periods of time and rarely accessed. Data often found on offline media includes old projects and archived email. Media used for offline storage is often the same as media used for backup such as tape drives and Optical media. When referring to offline storage we refer to archives, not backups. Archives are typically static whereas backups are typically dynamically changed with each backup. Offline storage still needs to be redundant or backed up, but the schedules for backup are often more lax than with that of other classifications of storage. In a Small or Medium Sized company offline media is often backed up, or duplicated, to the same type of media that it is housed on. There may be two copies of a tape (one onsite and one offsite) or two copies of DVD’s that the data has been burned onto, with each copy stored in a different physical location.

Near-line storage bridges the gap between online and offline storage by providing faster data access than archival storage at a lower cost than primary storage. Firewire Drives are often considered near-line storage because they are slower and usually not redundant. Near-line can refer to recent projects, old financial data, office forms that are updated rarely and backups of online storage to be made readily available for rapid recovery. Backup of Near-line storage will probably be to tape.

Data Classification

Mission Critical data is typically stored in online storage. This data is the day-to-day production data that drives information-based businesses. This includes the jobs being worked on by designers, the video being edited for commercials and movies, accounting data, legal data (for law firms) and current items within an organizations groupware system.

For the small business, Vital and Sensitive data are often one and the same. Vital data is data that is used in normal business practices but can be down for minutes or longer. Sensitive data is often accounting data that a company can live without for a short period of time, but will need to be restored in the event of a loss in a short amount of time. Small business will typically keep Vital and Sensitive data on the same type of media but may have different backup policies for it. For example, a company may choose to encrypt sensitive data and not vital data.

Non-Critical data includes items such as digital records and personal data files of network users. Non-Critical data could also include a duplicate of Mission Critical data from online storage. Non-Critical data often resides on near-line or off-line media (as is the case with Email archives). Non-critical data primarily refers to data kept as part of a companies risk management strategy or for regulatory compliance. This includes old emails and financial records and others.

Classification Methods

The chronological method for classifying data is often one of the easiest and most logical. For example, a design firm may keep their mission critical current jobs on an Xraid, vital jobs less than three months old on a Firewire drive attached to a server and non-critical jobs older than three months on backup tapes or offline Firewire drives. It would not be possible to implement this classification without having the data organized into jobs first. Another way to look at this method is that data over 180 days old automatically gets archived.

This characteristic method of data organization means that data with certain characteristics can be archived. This can applied to accounting and legal firms. Whether a client is active or not simply represents a characteristic. If a type of clothing is in style or not represents another possible characteristic. Provided that data is arranged or labeled by characteristic, it is possible to archive using a certain characteristic as a variable or metadata. Many small and medium sized companies are not using metadata for files yet, so a good substitution can be using a file name to denote attributes of the files data.

The hierarchical method of data organization means that files or folders within certain areas of the file system can be archived. For example, if a company decides to close down their Music Supervision department then the data stored in the Music Supervision share point on the server could be archived.

Service Level Agreements

The final piece of the ILM puzzle is building a Service Level Agreement for data management within a company. This is where the people that use each type of data within an organization sit down with IT and define how readily available that data needs to be and how often that data needs to be backed up.

In a Small Business it is often the owners of companies that make this decision. In many ways, this makes coming to terms with a Service Level Agreement easier than in a larger organization. The owner of a small business is more likely to have a picture of what the data can cost the company. When given the cost difference between online and near-line storage, small business owners are more likely to make concessions easier than managers of larger organizations who do not have as much of an ownership mentality towards a company.

Building a good Service Level Agreement means answering questions about the data, asked per classification. Some of the most important questions are:

How much data is there?How readily available does the data need to be?How much does this cost the company, including backups? Given the type of storage used to house this data, how much is it costing the company? If nearly half the data can be moved to near-line storage what will the savings be to the company? In the event of a loss, how far back in time is the company willing to go for retrieval? Is the data required it to be in an inalterable format for regulatory purposes? How fast must data be restored in the event of a loss? How fast must data be restored in the event of a catastrophe? Will client systems be backed up? If so, what on each client system will be backed up?

Information Lifecycle Management

Most companies will use a combination of methods to determine their data classification. Each classification should be mapped to a type of storage by building a SLA. Once this is done software programs such as BRU or Retrospect can be configured for automated archival and backups. The backup/archival software chosen will be the component that implements the SLA, so should fill the requirement of the ILM policies put into place.

The schedules for archival and backups should be set in accordance with the businesses needs. Some companies may choose to keep the same data in online storage for longer than other companies in the same business because they have invested more in online storage or because they reference the data often for other projects. The business logic of the organization will drive the schedule using the SLA as a roadmap.

Setting schedules means having documentation for what lives where and for how long. Information Lifecycle Management means bringing the actual data locations inline with where the data needs to be. Once this has been done, the cost to house and back up data becomes more quantifiable and cost efficient. The SLA is meant to be a guideline and should be revisited at roadblocks and intervals along the way. Checks and balances should be put into place to ensure that the actual data management situation accurately reflects the SLA.

ILM and regulatory compliance are more about people and business process than about required technology changes. The lifecycle of data is important to understand. As storage requirements spiral out of control, administrators of small and medium sized organizations can look to the methods of Enterprise networking for handling storage requirements with scalability and flexibility.