Chapter 2. Hardware Components of a Linux FailSafe Cluster

Linux FailSafe Configurations, Topologies, Reset Model, and Storage Models

Linux FailSafe is expanded to include configurations with up to four nodes. This section explains Linux FailSafe configuration, topologies, and network connections. This section consists of these subsections:

Linux FailSafe Configurations: Reset and Failover Models

This subsection consists of the following:

Failover Models

The Linux FailSafe software uses heartbeats on an Ethernet network to determine that a node has failed. (This network is discussed in “Heartbeat Network and Ethernet Hub”) If a cluster node fails, the failover model determines how the applications (resources) on the failed node are handled so that they remain available to users:

  • Star: failover is to a backup node.

  • Hub/switched: failover is to the other node.

Multiple heartbeat networks are recommended.

Figure 2-1. Failover Models

Failover Models

Linux FailSafe software is used to determine the node to which the resource fails over, as explained in the Linux FailSafe Administrator's Guide.

Reset Model

Linux FailSafe uses serial reset lines to reboot a failed node in the cluster:

  • Ring reset: the system control port of each; a node is directly connected to a tty port on another node.

    1. In a two-node system, the system control port on each node is connected to a tty port on the other node.

    2. In a ring system with more than two nodes (up to four nodes), a tty port on each node is cabled to the system control port on the adjacent node; in this model, reset is unidirectional.

Configuring for Reset and Failover

A Linux FailSafe configuration can use either type of failover model. Table 2-1 summarizes Linux FailSafe failover models, reset model, and connections.

Table 2-1. Failover Models, Reset Model, and Connections

Failover

Reset

Nodes

Reset Connections

Star

Ring

2 to 4

Direct link (tty to system control port) between nodes

Hub/switched

Ring

2 to 4

Direct link (tty to system control port) between nodes, unidirectional


Configuration Types

This subsection describes specific Linux FailSafe configurations, as follows:

In each section, diagrams show the pertinent networks and connections.

Two-Node Configuration

Figure 2-2 diagrams a Linux FailSafe two-node configuration.

Figure 2-2. Two-Node Configuration, Node-to-Node Reset

Two-Node Configuration, Node-to-Node Reset

In the configuration in Figure 2-2, the two nodes listen to each other's heartbeat on the heartbeat network, which must be Ethernet. Each can power-cycle the other on the reset network's two serial lines.

Four-Node Configuration: Ring Reset and Hub/Switched Failover

Figure 2-3 diagrams a Linux FailSafe four-node ring configuration. This configuration uses hub/switched failover and ring reset.

Figure 2-3. Four-Node Configuration: Ring Reset and Hub/Switched Failover

Four-Node Configuration: Ring Reset and Hub/Switched Failover

Heartbeat is via an Ethernet connection from the nodes to the Ethernet hub

Linux FailSafe Networks

The Linux FailSafe cluster has these networks:

  • Heartbeat network (Ethernet) connection between nodes: the keep-alive heartbeat for monitoring node status and Linux FailSafe control messages.

    Heartbeat is via an Ethernet connection directly between the nodes, or from the nodes to an Ethernet hub.

  • Reset network (serial) for power cycling a node: if a node fails, this serial connection provides for reset. This network is also referred to as the control network.

    Depending on the configuration, the failed node is power cycled by a surviving node in the cluster.

  • Shared fibre channel connection to storage

    Fibre Channel Hub devices may be part of a Fibre Channel RAID or JBOD connection.

  • Public (Ethernet) network interface(s) connecting the cluster node to clients and the outside world

    If a node has multiple public network interfaces, they must be on different subnets.

This subsection discusses these networks (except the public network) further in the following:

Heartbeat Network and Ethernet Hub

The heartbeat network is the keep-alive heartbeat for monitoring node status and Linux FailSafe control messages. This network is an Ethernet connection directly between the nodes, or from the nodes to the Ethernet hub (heartbeat hub).

When the failure of a node is detected on the heartbeat network, one of the other nodes in the cluster uses the serial reset network to power-cycle the failed node. The storage of the failed node is taken over by another node in the cluster.

For the connection to the Ethernet hub or network, RJ45-RJ45 null modem cables (9290131) are included with the option. This cable is 20 feet long; the customer might have ordered the 40-foot optional cable (9290132).

Reset Network

When the heartbeat network carries information that a node has failed, the failed node is power-cycled by a surviving node in the cluster.

The reset network consists of serial connections to each node's system control port.

After a successful reset, one of the surviving nodes in the cluster takes over the resources owned by the failed node. The node that resets the failed node need not be in the cluster.

Determining failover -- directing a particular node to take over the function of the failed node and its access to storage -- is not necessarily done by the same entity that does the reset.

Cabling the Ethernet Networks

This section explains how to set up the nodes in the Linux FailSafe cluster. It discusses software installation and interface board installation. It explains how to cable the heartbeat Ethernet connection for Linux FailSafe, as well as the public Ethernet connection.

This section consists of these subsections:


Note: Before installing a Linux FailSafe system, make sure that the installation site meets the operating limits and AC power requirements.

The following equipment is required for installation:

  • Laptop or ASCII terminal

  • Phillips and small flat-blade screwdrivers

  • Installation guides for the component systems (see “About This Guide”)

Cabling the Heartbeat Ethernet Network

The network between the cluster nodes supplies the heartbeat of each node to other nodes or equipment monitoring system status, as well as other Linux FailSafe information.

Figure 2-4 shows the Ethernet connector on the SGI 1200 panel. The server might also have one or more ENET boards.

Figure 2-4. SGI 1200 Deskside Server Ethernet Ports (Rear of Chassis)

SGI 1200 Deskside Server Ethernet Ports (Rear of Chassis)

To cable the heartbeat network, attach an end of the null modem Ethernet cable supplied with the Linux FailSafe system (part number 018-0700-001) to an Ethernet port on each host module. Depending on the configuration, attach the other end to one of the following:

  • Two-node configuration: the other node's Ethernet port

  • Other configurations: Ethernet hub

Figure 2-5. Two SGI 1450s Connected Back-to-Back

Two SGI 1450s Connected Back-to-Back

Figure 2-6 shows cabling for an Ethernet hub.

Figure 2-6. Cabling the Ethernet Hub for the Heartbeat Network

Cabling the Ethernet Hub for the Heartbeat Network

Cabling the Public Network

On each node, connect the public network drop cable to an Ethernet port. If the configuration uses an Ethernet hub for the public network, cable the nodes to the hub and attach the public network drop cable to the hub.

Cables for the public Ethernet network are not included in the Linux FailSafe marketing codes. The part number for this cable is 9290131.

Testing the Public Network Interface

For each public network on each node in the cluster, enter

# /usr/etc/ping nodeIPaddress

where nodeIPaddress is the IP address of the node. Typical ping output should appear, such as

PING IPaddress
64 bytes from 190.x.x.x: icmp_seq=0 tt1=254 time=3 ms
64 bytes from 190.x.x.x: icmp_seq=1 tt1=254 time=2 ms
64 bytes from 190.x.x.x: icmp_seq=2 tt1=254 time=2 ms

If ping fails, follow these steps:

  1. Verify that the network interface was configured up using ifconfig; for example:

    # /usr/etc/ifconfig eth0
    eth0 Link encap:Ethernet HWaddr 00:C0:4F:58:6E:B9
    inet addr: 190.x.x.x Bcast:190.x.x.x Mask:255.255.255.0
    UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
    RX packets:1254523 errors:0 dropped:0 overruns:0 frame:0
    TX packets:1565980 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:100
    Interrupt:11 Base address:0xdc00

    The UP in the output indicates that the interface was configured up.

  2. Verify that the cables are correctly seated.

Repeat this procedure on each node.

Configuring and Testing Heartbeat Network Connectivity

To configure the heartbeat network interface, follow instructions on network interface and IP address configuration in the Linux FailSafe Administrator's Guide. Configure the interface on each node in the cluster.

To test connectivity for a cluster, use ping as described in “Testing the Public Network Interface”.

Cabling the Reset Network (Serial Connection)

If a node fails, a serial connection enables another node to power-cycle it. As of now, the serial connection has to be direct.

After you set up the heartbeat and public Ethernet connections for the nodes, as explained in “Cabling the Ethernet Networks”, cable the serial reset network. This section explains cabling this reset network for various configurations:

Ports for the Serial Connection

The serial connection can be done one way:

  • Ring reset: system control (EMP) port on each node to tty port on another node

Figure 2-7 and Figure 2-8 show the tty serial ports on the SGI 1200 and 1450.

Figure 2-7. SGI 1200 Server tty Ports

SGI 1200 Server tty Ports

Figure 2-8. SGI 1450 Server tty Ports

SGI 1450 Server tty Ports

Cabling the Serial Connection: Ring Reset

For ring reset, follow these steps to cable the serial connection:

  1. For an SGI 1200 or 1450, make sure that nothing is connected to the serial port.

  2. Make sure that the nodes are arranged so that the serial cables included in the kits will reach the EMP ports and the tty ports.

  3. Cable the serial connection.

    For a two-node cluster, cable each node's EMP port to the other node's tty port.

    For a four-node cluster:

    • Connect the EMP port on the first node to a tty port on the second node.

    • Connect the EMP port on the second node to a tty port on the third node.

    • Connect the EMP port on the third node to a tty port on the fourth node.

    • Connect the EMP port on the fourth node to a tty port on the first node.

Figure 2-9 shows the serial connection between two SGI 1200 servers in a two-node configuration.

Figure 2-9. Serial Connection and Heartbeat Ethernet Connection, SGI 1200 Servers, Two-Node Configuration

Serial Connection and Heartbeat Ethernet Connection, SGI 1200 Servers, Two-Node Configuration

Figure 2-10 shows the serial connection between two SGI 1450 servers in a two-node configuration.

Figure 2-10. Serial Connection and Heartbeat Ethernet Connection, SGI 1450 Servers, Two-Node Configuration

Serial Connection and Heartbeat Ethernet Connection, SGI 1450 Servers, Two-Node Configuration

Figure 2-11 shows the serial connection between an SGI 1450 server and an SGI 1200 deskside server in a two-node configuration.

Figure 2-11. Serial Connection and Heartbeat Ethernet Connection, SGI 1200 and SGI 1450 Servers, Two-Node Configuration

Serial Connection and Heartbeat Ethernet Connection, SGI 1200 and SGI 1450 Servers, Two-Node Configuration

Figure 2-12 diagrams a Linux FailSafe four-node cluster with ring reset.

Figure 2-12. Four-Node Cluster Ring Reset

Four-Node Cluster Ring Reset

Testing the Serial Connection

The cluster manager has two mechanisms for checking the serial interface.

  • For a node using the cluster reset services daemon, use admin ping node nodename; for example,

    cmgr> admin ping node fs0

  • For a node that is not using the cluster reset services daemon, use admin ping standalone node nodename; for example,

    cmgr> admin ping standalone node fs0

Testing the Serial Connection for a Node in a Cluster

To test the serial connection for an individual server, use admin ping node nodename.

The test exits when it finds the first error and pipes an error message to standard output. Error messages include the name of the server that fails to respond.

Follow these steps:

  1. Make sure the cluster nodes are powered on. Power on the serial multiplexer.

  2. On a cluster node, start the cluster manager:

    cmgr

  3. If necessary, stop Linux FailSafe on each node:

    cmgr> stop ha_services for cluster clustername

    where clustername is the cluster you are testing. Wait until the node has successfully transitioned to standby state and the FailSafe processes have exited. This process can take a few minutes.

  4. Enter the following:

    cmgr> admin ping nodename

    For example:

    cmgr> admin ping sys2

    Sample output follows:

    ping operation successful

  5. If the command fails, make sure all the cables are seated properly and rerun the command.

Testing the Serial Connection for a Standalone Node

For a server that is not using the cluster reset services daemon, use admin ping standalone node nodename; for example:

cmgr> admin ping standalone node nodename

Sample failure output follows:

Internal error : crad is running, cannot perform system controller
operation in the standalone mode.

Failed to admin ping

admin command failed

Sample successful output follows:

ping operation successful

Installing and Configuring VACM

The following are the basic steps to enable crsd to use VACM for reset, and use cross-over cables for connecting the reset. These steps have to be done on each 1450 node.

  1. Bring the machine down to the BIOS.

    1. Ensure that the EMP port is enabled.

    2. If you plan to use the console serial port to control the EMP port, ensure that console redirection is disabled. Failure to do so will prevent booting without disconnecting the serial cable.

    For BIOS details, see Chapter 3, Configuring Software and Utilities of the SGI 1450 Server User's Guide.

    SGI Linux servers ship configured with EMP on and console redirection on.

  2. Install the following VACM RPMs:

    vacm-lib-2.0.0beta
    vacm-vash-2.0.0beta
    vacm-2.0.0beta

  3. Configure nexxus, the VACM daemon, using vash on the controlling node. It is assumed that nexxus is already running, the BIOS is configured correctly, and the serial cable is in place.

    hostname / nodename / ip address
    host1    node1    150.166.8.57    (Controlling Node)
    host2    node2    150.166.8.58    (Controlled Node)

    1. Start nexxus the first time (must be root).

      host1# nexxus &

    2. Login to localhost (can be any user) as default nexxus user blum and password frub.

      host1# vash -c localhost -u blum -p frub
        NEXXUS_READY
      vash$

    3. If you wish to change the login name from blum to barf, and change the password from frub to foobar.

      vash$ ipc localhost nexxus:admin_rename:blum:barf
        NEXXUS:1:JOB_STARTED
        NEXXUS:1:JOB_COMPLETED
      vash$ ipc localhost nexxus:admin_chg_passwd:frub:foobar
        NEXXUS:1:JOB_STARTED
        NEXXUS:1:JOB_COMPLETED

    4. Enable remote machine(s) to run vash/hoover commands (by IP_address:netmask_bits).

      vash$ nexxus:admin_add_addr_acl_rule:blum:allow:150.166.8.0:16
        NEXXUS:1:JOB_STARTED
        NEXXUS:1:JOB_COMPLETED

    5. Add the group failsafe and make blum an admin for it.

      vash$ ipc localhost nexxus:group_add:failsafe
        NEXXUS:3:JOB_STARTED
        NEXXUS:3:JOB_COMPLETED
      vash$ ipc localhost nexxus:group_add_admin:failsafe:blum
        NEXXUS:4:JOB_STARTED
        NEXXUS:4:JOB_COMPLETED

    6. Add the controlled node (host2/node2) to the group failsafe.

      vash$ ipc localhost nexxus:node_add:node2:failsafe
        NEXXUS:6:JOB_STARTED
        NEXXUS:6:JOB_COMPLETED

    7. Enable blum to run emp commands.

      vash$ ipc localhost nexxus:admin_add_mod_acl_rule:blum:allow:emp:*
        NEXXUS:2:JOB_STARTED
        NEXXUS:2:JOB_COMPLETED

    8. Add the host node2 to the emp (the IPMI interface). There is no EMO password on node node2 (“NONO”).

      vash$ ipc akash emp:configuration:node2:/dev/tts/0:NONE
        NEXXUS:5:JOB_STARTED
        NEXXUS:5:JOB_COMPLETED

    9. Now you can verify the IMPI status (this is what the crsd ping command maps to).

      vash$ ipc localhost emp:node_status:node2
        EMP:7:JOB_STARTED
        EMP:7:NOSTATUS:/dev/tts/0:DETECTED
        EMP:7:JOB_COMPLETED

    10. Exit vash.

      vash$ exit
      host1#

    11. To clean up a botched config:

      root# killall nexxus
      root# > /usr/lib/vacm/vacm_configuration
      root# nexxus &

    12. And then start from scratch.

  4. After the rest of FailSsafe configuration and setup is completed, configure FailSafe for host1 (node1) to reset host2 (node2).

    cmgr> show node node2
    Logical Machine Name: node2
    Hostname: node2
    Is FailSafe: true
    Nodeid: 2
    Reset type: powerCycle
    System Controller: vacm
    System Controller status: enabled
    System Controller owner: BackupNode
    System Controller owner device: host1,blum,frub
    System Controller owner type: tty
    ...