NetApp Failovers

Each controller of a NetApp FAS will typically have two network interfaces. Provided I have two storage controllers (and I usually do) I typically prefer to setup a NetApp in an automated failover scenario. A NetApp active/active configuration consists of two storage nodes) whose controllers are connected to each other either directly or through switches. The nodes are connected through a cluster adapter or an NVRAM adapter, which allows one node to serve data to the disks of its failed partner node. Each node continually monitors its partner, mirroring the data for each other’s nonvolatile RAM (NVRAM).

Before configuring the filers for an active/active clustered failover, first verify that the dates are in sync between the nodes (if you’re using multiple nodes) using the date command. If they are not, then configure NTP using the options command. For example to following uses 192.168.55.98 as an NTP host and time.apple.com as another, setting the time.servers option:

options timed.servers time.nist.gov,10.0.0.44

Other timed options include timed.sched, which sets the schedule for when times are updated in the case of time skews. There is also timed.proto, which allows you to use ntp or rtc.

Once verified then you will move on to setting up the cf engine. When configuring clustering on the filers, you will use the cf command. The following command will give you a status as to the configuration as well as the status of the cf engine:

cf status

Provided that cf is currently disabled, the following command will go ahead and enable it:

cf enable

In order to initiate a failover event you can use the following command (or start unplugging some cables;):

cf takeover

If you are testing by unplugging cables then it is worth mentioning that the takeover and giveback processes are initiated after 30 seconds of not hearing from the partner interface. Older releases of the firmware can require an additional 45 seconds to complete the takeover/giveback. If you see an error that an interface “cannot be configured: address does not match any partner interface” then you might have a problem with the IP configuration of one of the controllers, for example a missing partner IP address. The easiest way to remedy that is to simply rerun the setup command and zip through the wizard, defining the partner IP in the process.

Once a failover event occurs you can fail the controllers back to the original configuration using the cf command with the giveback option, as follows:

cf giveback

At some point you may choose to turn off clustering, to do so use the following command:

cf disable