Resolve Error 1006.0005 For Qlogic Switches

Error 1006.0005 can appear on a Qlogic fibre channel switch when using ACL zones. If you don’t need ACL zones, then the easiest thing to do here is to swap the offending zone back to a soft zone. To do so, open the Qlogic Switch and use the Edit menu to select “Edit Zoning …” Screen Shot 2014-03-05 at 2.12.57 PM From the zone editor, right-click on the zone to change and click on Set Zone Type. Screen Shot 2014-03-05 at 2.17.24 PM From the Set Zone Type pop-up, click on the option for Soft. Screen Shot 2014-03-05 at 2.18.37 PM Save the zoning and provided that you can actually use soft zones you are done. Now, what if you can’t use soft zoning? In that case, I find that this error specifically comes up when you have a device in a soft and ACL-based zone. To rectify that, either switch the soft zone to ACL or define the port in the ACL zone and the WWN in the soft zone.

One Liner Script To Check If Xsan Is Installed

The following will tell you whether Xsan has been installed on a client system. Here we’re checking if the file exists using the [] for a file (I always quote paths that aren’t variables when doing this type of thing) and and then echoing a response that it does. [ -f "/Library/Preferences/Xsan/uuid" ] && echo "Xsan is installed" If the file exists, we could also perform some other tasks or use an else and make changes, like copying an authorization and fsnameservers file into the directory when installing StorNext clients on OS X. The way I would likely do this, if I were saying if the uuid file doesn’t exist, do a task would be: [ | -f "/Library/Preferences/Xsan/uuid" ] && echo "Xsan is not installed" In the above example, placing the pipe in front acts as a negative operator, so these two lines are basically the opposite of one another.  

Rorke Aurora Galaxy and Xsan

With Apple bundling Xsan into Lion and opening up more storage options than before, it seems like time to start exploring alternatives to Promise Vtrak’s for Xsan storage. ActiveStorage makes a very nice RAID chassis and should be shipping metadata controller appliances soon. I’ve discussed both here before and they make for very nice kit. But in order to have an ‘ecosystem’ you really need a little biodiversity. And the Xsan environment needs to become more of an ecosystem and less of a vendor lock-in situation. So another option that I’d like to discuss is the Rork Aurora Galaxy. These little firecrackers have a lot of potential upside:
  • 4 8Gbps Fibre Channel controllers (in the form of Atto Celerity cards)
  • 36 drive bays
  • 3TB drive modules
  • Low power requirements
  • Linux, so you have root access
  • For those familiar w/ Webmin, the NumaRAID plug-in will seem familiar
  • Great tech support
  • More PCI slots, so upgradeable with more cards, etc.
36 drive bays at 3TB per bay means 108TB of raw storage running at 32Gbps per chassis. I recently had the chance to put a pair of these things through their paces. Using a combination of vMeter and the Qlogic Enterprise Fabric Suite Manager, we added stream after stream and when all of the clients were running multicam edits for an aggregate throughput of well over 50Gbps, we still hadn’t yet found a place where we started to drop frames. But we ran out of clients, streams, media, etc so stopped testing at that point. Watching all the statistics on the RAIDs and clients though, I do not doubt that we could have saturated a good 60Gbps. When the RAIDs show up they have 3 LUNs baked into them. Given the size of the RAID and how Xsan likes to have LUNs added, it seemed prudent to convert those 3 LUNs into 4 LUNs. You manage the RAID by using Webmin, which runs on a default port of 10000. The default IP address is 192.168.1.129, so use the address:
http://192.168.1.129:10000
When you see a login screen, the default username and password is admin/password. Click on NumaRAID in the upper left corner of the screen to see details of the chassis, with the RAIDs first and the LUNs second. Click on a RAID to delete the RAID and associated LUNs for that RAID. When you delete LUNs you need to restart the RAID controller for clients with Apple LSI cards to see them. This can be done by stopping the nr_target service: service nr_target stop Then start it back up: service nr_target start You then have a choice of how to create the RAIDs. You can carve a RAID up to create a LUN. Therefore, you can create 1 or 2 RAID 6 sets and carve those up into 4 LUNs. With 4 fibre channel ports, we then associated 1 LUN per port. However, on testing failover, we realized that when you perform a 1:1 mapping in such a fashion that you get no fibre channel failover. While unlikely that a fibre channel port will fail, there can often be a lot of points of failure on the cables, which may go through fibre channel patch panels and other malfeasants en route to their switch. Therefore, we associated two ports to each LUN. This represented about a 6% boost in performance over not masking any LUNs to ports, although about 3% less than the 1:1 mapping scheme. If you do 4 LUNs you can use about 4gigs of RAM per LUN safely, although I was unable to tell any performance change between using the factory default per LUN cache of 1gig. You will also want to set them as targets in your switch, which may RSCN suppress as most switches tend to think an ATTO card is an initiator (and it often is). When you save things in the Webmin GUI, you are executing writes into an XML file that is stored at /usr/libexec/webmin/NumaRAID/nrconfig.xml. Once one of these is in production, I would personally start making a backup of this file before and after any changes. As an example if you cat nrconfig you can grep for certain items, such as lun: cat nrconfig.xml | grep -i lun The number of LUNs that you see here should match up with the output of lsscsi: lsscsi -s You can also use lsscsi to see the other hosts on the network and confirm that your zoning is open: lsscsi -H You can also change various settings, such as the raid_cache size in the nrconfig.xml file. When you make a change in the xml file, make sure to back it up first and when you bring the nr_target service back up, make sure to watch /var/log/messages, where the Aurora Galaxy saves its logs: tail -f /var/log/messages From your clients, look at /dev. You should see that there is one device node per LUN, per port that the LUN is available on in each chassis. If you do not mask the LUNs then you would do well to create a zone for each port of the RAID, much as you might do with Linux clients of your Xsan. If you use soft zoning and build groups, this actually doesn’t take much time at all. There isn’t any special foo for volume creation. Once the clients can see the LUNs, treat the new volume as a bigger, less power consuming version of your existing storage. Now, while Linux clients can often get away doing non-port based zoning, the Rorke will likely show in Xsan Admin twice on at least one LUN but not more than 2 LUNs per chassis if it’s not zoned just right. This can cause high PIO HiPriWr stats. Once it’s in production you’ll want to manage it. The IP and user/password can be changed pretty easily. Beyond that, there isn’t an snmp MIB specifically for the Aurora. However, it is just Linux, so a standard snmpwalk against a Linux host from your monitoring solution should result in all you need to know regarding possible hardware failure. And the fact that it’s Linux brings up an interesting point. This device is made from off-the-shelf hardware (good off-the-shelf hardware too, not some crappy Fry’s mother board that was on sale ’cause they spilled a 1.75 liter of Crown and Coke on it while partying in the back). It’s a computer with a bunch of drives in it. And many may be concerned about putting a “software RAID” into production. But think about it this way, all RAID controllers are software. The question is whether the software is baked into some EEPROM chip or firmware, or whether it’s loaded on top of a generic kernel. There are certainly a few things you should not do with an appliance-oriented Linux distro. Running a torrent off it for example. But if you need to be told that then you should have your RAIDs taken away… While you might be able to get away with it, don’t run Samba on these (if they’re going into an Xsan/StorNext/HyperFS environment they’re gonna’ have a virtualized file system anyway). Don’t install StorNext (it’s confusing, am I a target or an initiator, I’m so confused I think I’ll just fail repeatedly). Don’t install the root kit or even a lot of security tools. If antivirus, snort or host based IDS tools are required just unplug the network cable. This is an appliance. You can treat it as such and pretend that you only have the Webmin access (I just can’t help but to start tinkering around under the hood…). Finally, I don’t have long-term statistics on how these will hold up. I’ve never heard anyone complain terribly about them, but this model is still pretty new. Given the disk density I’m curious to see how things will go. Rorke will sell you a parts kit in the form of an unlicensed, diskless chassis. Given my experience with other vendors I’d have to recommend getting one, but they didn’t seem too insistent that it was a requirement.

Installing Windows Clients for Xsan & StorNext

There are a lot of environments that attach Windows client computers to an Xsan or StorNext filesystem. In the past I’ve looked at using different versions of StorNext to communicate with Xsan, but in this article we’re actually going to take a look at Quantum’s StorNext FX2 client software. Before getting started, you’ll want to have the StorNext media, have the serial number added to the metadata controllers, have the HBA (fibre channel card) installed, have the fibre patched into the HBA, have the IP addresses for the metadata controllers documented and have a copy of the .auth_secret file obtainable from the metadata controllers once they’ve been properly licensed. To get started, first install the HBA drivers. This will be different for each brand of card, but in most cases it will be a simple installer. Once the installer is run and the system rebooted (not all HBA’s required reboots), you will see a screen similar to the following, indicating the Generic SCSI Array Device is installed and then starting to recognize the LUNs that comprise the volume. Make sure not to configure a new filesystem for the LUNs, especially if they are already in use. Once you can see the hardware infrastructure you are ready to install the software. If using a 64 bit version of Windows then contact StorNext for the installers, otherwise you should be able to use those downloaded from the website. Always start with the latest drivers rather than using those distributed on the media with the StorNext license. Once you have the installer ready, click on it and you will see the StorNext Installation screen. Here, provided it is a new installation, the first item wills ay Install StorNext. As you can see from the following screen, you also have the options in the future to Upgrade, Reinstall, Remove and Configure StorNext from the StorNext Installation screen. At the StorNext Component selection screen, you will be able to select whether to install the Help Files (FAQs) and/or the StorNext FX2 client. Here, you can probably leave both enabled and simply click on the Next>> button. You should now see your LUNs. Open the Windows shell environment and cd to the C:Program FilesStorNextbin directory.  Run cvlabel -l and verify all of the LUNs that are needed for the volumes you will be working with are present. If they are not, check your zoning and physical infrastructure. Still using the Windows shell environment, cd into the C:Program FilesStorNextconfig directory. Edit the fsnameservers file and enter the IP addresses for your metadata controllers in the order they appear in the fsnameservers list on your metadata controllers. You will also need to copy the .auth_secret file from the metadata controllers to the client computer. By default, this will be copied into the C:Program FilesStorNextconfig directory. Next, reboot. Provided that we can see our LUNs, we should also be able to see our volumes through cvadmin. To verify, we can use the cvadmin command in much the same way we do so in StorNext for Linux or Xsan: cvadmin -e select Provided all of the volumes appear and each has a valid controller (see those with an * below), we can then go into the StorNext Client Configuration tool to complete the client configuration. To do so, open Client Configuration from Start -> Program Files -> StorNext. Once open, you should see the volumes shown in cvadmin and they should be listed as available. Windows accesses volumes through what are known as drive mappings. A drive map is an alphabetical representation of a location. These locations can be folders within a file system, network volumes or direct attached volumes. the configuration is complete, Windows will treat the drive letter being configured as a local volume. Select Tools Properties and you will then be able to map a drive to the appropriate drive letter. It is also possible to mount an volume into a given directory; however, this is somewhat rare comparably. Repeat this process until all volumes are mapped and click on Apply. Now you can mount your new volumes. To do so click on them back at the main screen and you will then be prompted to mount. Click Yes. That’s pretty much it. Have fun.

Installing a Vtrak for Windows

If you are installing a Vtrak from Apple on Microsoft Windows you can download the drivers from Promise here: http://www.promise.com/support/download/download_eng.asp Having said this, you can use the Promise drivers or generic drivers if you’re using the Promise as targets and connecting to those LUNs via StorNext that are managed by Xsan. The reason for this is that StorNext will manage the LUNs.  To see the LUNs, check Windows Device Manager.

Debug Logging in StorNext

If you create a folder in c:Program FilesStorNext called debug then after you restart the FSS StorNext will create a file called c:Program FilesStorNextdebugnssdebug.out, which contains very verbose logs from the perspective of the StorNext system. This can be useful, for example, in debugging connectivity issues with other StorNext systems and/or Xsan.

StorNext Command Line for Windows

StorNext for Windows comes with many of the same commands that are available with Xsan on Mac OS X. Located by default in the c:Program FilesStorNextbin directory, you can use the cv* commands in much the same way as on a Mac. This can help with regards to troubleshooting. For example, if you are having problems getting a volume to mount, even though it shows up when you go to map the drive in Client Configuration, you can use cvlabel -l (assuming your working directory is the StorNext bin directory) to see the LUNs that are accessible by your host. If you cannot see your LUNs then you also cannot map a drive to those same LUNs (it will work in the Client Configuration utility but you will not be able to see the volumes in Windows Explorer or from a command prompt. Once you have confirmed that you can see Xsan LUNs from StorNext and that you can communicate with the Metadata Controller go ahead and stop and start the FSS to see if the volume then appears in Windows Explorer. If you’re using StorNext as actual Metadata Controllers then there are a number of other commands that you can leverage, again similarly to how you would do so with Xsan. For example, to start a volume, you would use the cvadmin command followed by start and then the name of the volume. For example, if your volume was named bighonkinvolume you would use the following: cvadmin start bighonkinvolume

Graphically Viewing Xsan Utilization

As I’ve covered, du and df are great tools for isolating disk utilization, both for HFS+ and for Xsan. When dealing with end users though, it sometimes helps to show them information graphically. Another tool I’ve covered (although not comprehensively) is Disk Inventory X. A connection I had never tried to make until recently is using Disk Inventory X to find the “big fish” in terms of volume utilization with Xsan. When you fire up Disk Inventory X it will ask you to select a volume, or you can click on the Select Folder: button to browse to a folder. Disk Inventory X will then catalog the contents and show you a graphical representation of how files on your volume are using up your capacity. What I’m finding is that when you sit people down in front of a SAN that is, let’s say, 95% full, that it’s hard to fathom, just looking at the Finder or at a du screen, what they can remove versus what they should keep. Therefore, if you come to them with a graphical representation of how the file system is currently being utilized, they can see the files that are huge (aka the “big fish”) and determine if there is a real business value in keeping them. This is actually more true with Xsan than with most other file systems. The reason is that you have less total files on most Xsan volumes whereas those single files are usually substantial in size. So if there are 2 year old 1080p clips that haven’t been touched, taking up tons of space – well, it’s pretty easy to isolate that. Disk Inventory X also has another aspect that makes it easy to fix these space issues. Let’s say that you see 20 files, each about 10 Gigabytes. And let’s say that you have determined that you can just remove them. Using Disk Inventory X you can click on each one and then use the Command-Delete keystroke to remove them. Disk Inventory X will then automatically adjust the graphical representation to show you the new layout; albeit until you empty the trash you’ll see the files there… Another tool that I’ve been tinkering around with is the open source package TreePie, available on SourceForge. TreePie can show very similar data, if you only have StorNext Windows systems to view it from.