Linux » Books » Administrative »
Linux FailSafe Administrator's Guide
(document number: 007-4322-002 / published: 2001-02-28)
table of contents | additional info | download find in page
To set up a Linux FailSafe system, you configure the cluster that will
support the highly available services. This requires the following steps: The following subsections describe these tasks. A cluster node is a single Linux image. Usually, a cluster node
is an individual computer. The term node is also used
in this guide for brevity. The pool is the entire set of nodes available
for clustering. The first node you define must be the local host, which is the host
you have logged into to perform cluster administration. When you are defining multiple nodes, it is advisable to wait for a
minute or so between each node definition. When nodes are added to the configuration
database, the contents of the configuration database are also copied to the
node being added. The node definition operation is completed when the new
node configuration is added to the database, at which point the database configuration
is synchronized. If you define two nodes one after another, the second operation
might fail because the first database synchronization is not complete. To add a logical node definition to the pool of nodes that are eligible
to be included in a cluster, you must provide the following information about
the node: Logical name: This name can contain letters and numbers but
not spaces or pound signs. The name must be composed of no more than 255 characters.
Any legal hostname is also a legal node name. For example, for a node whose
hostname is “venus.eng.company.com” you can use a node name of “venus”, “node1”,
or whatever is most convenient.
Hostname: The fully qualified name of the host, such as “server1.company.com”.
Hostnames cannot begin with an underscore, include any whitespace, or be longer
than 255 characters. This hostname should be the same as the output of the
hostname command on the node you are defining. The IP address associated with
this hostname should not be the same as any IP address you define as highly
available when you define a Linux FailSafe IP address resource. Linux FailSafe
will not accept an IP address (such as “192.0.2.22”) for this
input. Node ID: This number must be unique for each node in the pool
and be in the range 1 through 32767. System controller information.
If the node has a system controller and you want Linux FailSafe to use the
controller to reset the node, you must provide the following information about
the system controller: Type of system controller: chalL, msc, mmsc System controller port password (optional) Administrative status, which you can set to determine whether
Linux FailSafe can use the port: enabled, disabled Logical node name of system controller owner (i.e. the system
that is physically attached to the system controller) Device name of port on owner node that is attached to the
system controller Type of owner device: tty
A list of control networks, which are the networks used for heartbeats,
reset messages, and other Linux FailSafe messages. For each network, provide
the following: Hostname or IP address. This address must not be the same
as any IP address you define as highly available when you define a Linux FailSafe
IP address resource, and it must be resolved in the /etc/hosts
file. Flags (hb for heartbeats, ctrl for control messages, priority). At least
two control networks must use heartbeats, and at least one must use control
messages. Linux FailSafe requires multiple heartbeat networks. Usually a node
sends heartbeat messages to another node on only one network at a time. However,
there are times when a node might send heartbeat messages to another node
on multiple networks simultaneously. This happens when the sender node does
not know which networks are up and which others are down. This is a transient
state and eventually the heartbeat network converges towards the highest priority
network that is up. Note that at any time different pairs of nodes might be using different
networks for heartbeats. Although all nodes in the Linux FailSafe cluster should have two control
networks, it is possible to define a node to add to the pool with one control
network.
To define a node with the Cluster Manager GUI, perform the following
steps: Launch the FailSafe Manager. On the left side of the display, click on the “Nodes
& Cluster” category. On the right side of the display click on the “Define
a Node” task link to launch the task. Enter the selected inputs on this screen. Click on “Next”
at the bottom of the screen and continue inputing information on the second
screen. Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
Use the following command to add a logical node definition: Entering this command specifies the name of the node you are defining
and puts you in a mode that enables you to define the parameters of the node.
These parameters correspond to the items defined in Section 5.4.1.
The following prompts appear: Enter commands, when finished enter either "done" or "cancel" |
A? When this prompt of the node name appears, you enter the node parameters
in the following format: set hostname to B
set nodeid to C
set sysctrl_type to D
set sysctrl_password to E
set sysctrl_status to F
set sysctrl_owner to G
set sysctrl_device to H
set sysctrl_owner_type to I
add nic J |
You use the add nic J
command to define the network interfaces. You use this command for each network
interface to define. When you enter this command, the following prompt appears: Enter network interface commands, when finished enter "done" or "cancel"
NIC - J? |
When this prompt appears, you use the following commands to specify
the flags for the control network: set heartbeat to K
set ctrl_msgs to L
set priority to M |
After you have defined a network controller, you can use the following
command from the node name prompt to remove it: When you have finished defining a node, enter done. The following example defines a node called cm1a, with one controller: cmgr> define node cm1a
Enter commands, when finished enter either "done" or "cancel" |
cm1a? set hostname to cm1a
cm1a? set nodeid to 1
cm1a? set sysctrl_type to msc
cm1a? set sysctrl_password to [ ]
cm1a? set sysctrl_status to enabled
cm1a? set sysctrl_owner to cm2
cm1a? set sysctrl_device to /dev/ttyd2
cm1a? set sysctrl_owner_type to tty
cm1a? add nic cm1
Enter network interface commands, when finished enter “done”
or “cancel”
NIC - cm1 > set heartbeat to true
NIC - cm1 > set ctrl_msgs to true
NIC - cm1 > set priority to 0
NIC - cm1 > done
cm1a? done
cmgr> |
If you have invoked the Cluster Manager CLI with the -p
option,or you entered the set prompting on command, the
display appears as in the following example: cmgr> define node cm1a
Enter commands, when finished enter either "done" or "cancel" |
Nodename [optional]? cm1a |
Node ID? 1
Do you wish to define system controller info[y/n]:y
Sysctrl Type <null>? (null)
Sysctrl Password[optional]? ( )
Sysctrl Status <enabled|disabled>? enabled
Sysctrl Owner? cm2
Sysctrl Device? /dev/ttyd2
Sysctrl Owner Type <tty>? (tty)
Number of Network Interfaces ? (1)
NIC 1 - IP Address? cm1
NIC 1 - Heartbeat HB (use network for heartbeats) <true|false>? true
NIC 1 - Priority <1,2,...>? 0
NIC 2 - IP Address? cm2
NIC 2 - Heartbeat HB (use network for heartbeats) <true|false>? true
NIC 2 - (use network for control messages) <true|false>? false
NIC 2 - Priority <1,2,...>? 1 |
After you have defined a cluster
node, you can modify or delete the cluster with the Cluster Manager GUI or
the Cluster Manager CLI. You must remove a node from a cluster before you
can delete the node. To modify a node with the Cluster Manager GUI, perform the following
steps: Launch the FailSafe Manager. On the left side of the display, click on the “Nodes
& Cluster” category. On the right side of the display click on the “Modify
a Node Definition” task link to launch the task. Modify the node parameters. Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
You can use the following command to modify an existing node. After
entering this command, you can execute any of the commands you use to define
a node. To delete a node with the Cluster Manager GUI, perform the following
steps: Launch the FailSafe Manager. On the left side of the display, click on the “Nodes
& Cluster” category. On the right side of the display click on the “Delete
a Node” task link to launch the task. Enter the name of the node to delete. Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
After defining a node, you can delete it with the following command: You can delete a node only if the node is not currently part of a cluster.
This means that first you must modify a cluster that contains the node so
that it no longer contains that node before you can delete it. After you define cluster nodes, you can perform the
following display tasks: display the attributes of a node display the nodes that are members of a specific cluster display all the nodes that have been defined
You can perform any of these tasks with the FailSafe Cluster Manager
GUI or the Linux FailSafe Cluster Manager CLI. The Cluster Manager GUI provides a convenient graphic display of the
defined nodes of a cluster and the attributes of those nodes through the
FailSafe Cluster View. You can launch the FailSafe Cluster View directly,
or you can bring it up at any time by clicking on “FailSafe Cluster
View” at the bottom of the “FailSafe Manager” display. From the View menu of the FailSafe Cluster View, you can select “Nodes
in Pool” to view all nodes defined in the Linux FailSafe pool. You can
also select “Nodes In Cluster” to view all nodes that belong to
the default cluster. Click any node's name or icon to view detailed status
and configuration information about the node. After you have defined a node, you can display the node's parameters
with the following command: A show node command on node cm1a would yield the
following display: cmgr> show node cm1
Logical Node Name: cm1
Hostname: cm1
Nodeid: 1
Reset type: reset
System Controller: msc
System Controller status: enabled
System Controller owner: cm2
System Controller owner device: /dev/ttyd2
System Controller owner type: tty
ControlNet Ipaddr: cm1
ControlNet HB: true
ControlNet Control: true
ControlNet Priority: 0 |
You can see a list of all of the nodes that have been defined with the
following command: You can see a list of all of the nodes that have defined for a specified
cluster with the following command: cmgr> show nodes [in cluster A] |
If you have specified a default cluster, you do not need to specify
a cluster when you use this command and it will display the nodes defined
in the default cluster. There are several parameters that determine the behavior of the nodes
in a cluster of a Linux FailSafe system. The Linux FailSafe parameters are as follows: The tie-breaker node, which is the logical name of a machine
used to compute node membership in situations where 50% of the nodes in a
cluster can talk to each other. If you do not specify a tie-breaker node,
the node with the lowest node ID number is used. The tie-breaker node is a cluster-wide parameter. It is recommended that you configure a tie-breaker node even if there
is an odd number of nodes in the cluster, since one node may be deactivated,
leaving an even number of nodes to determine membership. In a heterogeneous cluster, where the nodes are of different sizes and
capabilities, the largest node in the cluster with the most important application
or the maximum number of resource groups should be configured as the tie-breaker
node. Node timeout, which is the timeout period, in milliseconds.
If no heartbeat is received from a node in this period of time, the node is
considered to be dead and is not considered part of the cluster membership. The node timeout must be at least 5 seconds. In addition, the node timeout
must be at least 10 times the heartbeat interval for proper Linux FailSafe
operation; otherwise, false failovers may be triggered. Node timeout is a cluster-wide parameter. The interval, in milliseconds, between heartbeat messages.
This interval must be greater than 500 milliseconds and it must not be greater
than one-tenth the value of the node timeout period. This interval is set
to one second, by default. Heartbeat interval is a cluster-wide parameter. The higher the number of heartbeats (smaller heartbeat interval), the
greater the potential for slowing down the network. Conversely, the fewer
the number of heartbeats (larger heartbeat interval), the greater the potential
for reducing availability of resources. The node wait time, in milliseconds, which is the time a node
waits for other nodes to join the cluster before declaring a new cluster membership.
If the value is not set for the cluster, Linux FailSafe assumes the value
to be the node timeout times the number of nodes. The powerfail mode, which indicates whether a special power
failure algorithm should be run when no response is received from a system
controller after a reset request. This can be set to ON
or OFF. Powerfail is a node-specific parameter, and should
be defined for the machine that performs the reset operation.
To set Linux FailSafe parameters with the Cluster Manager GUI, perform
the following steps: Launch the FailSafe Manager from a menu or the command line. On the left side of the display, click on the “Nodes
& Cluster” category. On the right side of the display click on the “Set Linux
FailSafe HA Parameters” task link to launch the task. Enter the selected inputs. Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
You can modify the Linux FailSafe parameters with the following command: cmgr> modify ha_parameters [on node A] [in cluster B] |
If you have specified a default node or a default cluster, you do not
have to specify a node or a cluster in this command. Linux FailSafe will use
the default. Enter commands, when finished enter either "done" or "cancel" |
A? When this prompt of the node name appears, you enter the Linux FailSafe
parameters you wish to modify in the following format: set node_timeout to A
set heartbeat to B
set run_pwrfail to C
set tie_breaker to D |
A cluster is a collection of one or more nodes
coupled with each other by networks or other similar interconnects. In Linux
FailSafe, a cluster is identified by a simple name. A given node may be a
member of only one cluster. To define a cluster, you must provide the following information: The logical name of the cluster, with a maximum length of
255 characters. The mode of operation: normal (the default)
or experimental. Experimental mode allows you to configure
a Linux FailSafe cluster in which resource groups do not fail over when a
node failure is detected. This mode can be useful when you are tuning node
timeouts or heartbeat values. When a cluster is configured in normal mode,
Linux FailSafe fails over resource groups when it detects failure in a node
or resource group. (Optional) The email address to use to notify the system administrator
when problems occur in the cluster (for example, root@system) (Optional) The email program to use to notify the system administrator
when problems occur in the cluster (for example, /usr/bin/mail). Specifying the email program is optional and you can specify only the
notification address in order to receive notifications by mail. If an address
is not specified, notification will not be sent.
After you have added nodes to the pool and defined a cluster, you must
provide the names of the nodes to include in the cluster. To define a cluster with the Cluster Manager GUI, perform the following
steps: Launch the Linux FailSafe Manager. On the left side of the display, click on “Guided Configuration”. On the right side of the display click on “Set Up a
New Cluster” to launch the task link. In the resulting window, click each task link in turn, as
it becomes available. Enter the selected inputs for each task. When finished, click “OK” to close the taskset
window.
When you define a cluster with the CLI, you define and cluster and add
nodes to the cluster with the same command. Use the following cluster manager CLI command to define a cluster: Entering this command specifies the name of the node you are defining
and puts you in a mode that allows you to add nodes to the cluster. The following
prompt appears: When this prompt appears during cluster creation, you can specify nodes
to include in the cluster and you can specify an email address to direct messages
that originate in this cluster. You specify nodes to include in the cluster with the following command: cluster A? add node C
cluster A? |
You can add as many nodes as you want to include in the cluster. You specify an email program to use to direct messages with the following
command: cluster A? set notify_cmd to B
cluster A? |
You specify an email address to direct messages with the following command: cluster A? set notify_addr to B
cluster A? |
You specify a mode for the cluster (normal or experimental) with the
following command: cluster A? set ha_mode to D
cluster A? |
When you are finished defining the cluster, enter done
to return to the cmgr prompt. After you have defined a cluster, you can modify the attributes of the
cluster or you can delete the cluster. You cannot delete a cluster that contains
nodes; you must move those nodes out of the cluster first. To modify a cluster with the Cluster Manager GUI, perform the following
procedure: Launch the Linux FailSafe Manager. On the left side of the display, click on the “Nodes
& Cluster” category. On the right side of the display click on the “Modify
a Cluster Definition” task link to launch the task. Enter the selected inputs. Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
To delete a cluster with the Cluster Manager GUI, perform the following
procedure: Launch the Linux FailSafe Manager. On the left side of the display, click on the “Nodes
& Cluster” category. On the right side of the display click on the “Delete
a Cluster” task link to launch the task. Enter the selected inputs. Click on “OK” at the bottom of the screen to complete
the task, or click on “Cancel” to cancel.
To modify an existing cluster, enter the following command: Entering this command specifies the name of the cluster you are modifying
and puts you in a mode that allows you to modify the cluster. The following
prompt appears: When this prompt appears, you can modify the cluster definition with
the following commands: cluster A? set notify_addr to B
cluster A? set notify_cmd to B
cluster A? add node C
cluster A? remove node D
cluster A? |
When you are finished modifying the cluster, enter done
to return to the cmgr prompt. You can delete a defined cluster with the following command: You can display defined clusters with the Cluster Manager GUI or the
Cluster Manager CLI. The Cluster Manager GUI provides a convenient display of a cluster and
its components through the FailSafe Cluster View. You can launch the FailSafe
Cluster View directly, or you can bring it up at any time by clicking on the “FailSafe
Cluster View” prompt at the bottom of the “FailSafe Manager”
display. From the View menu of the FailSafe Cluster View, you can choose elements
within the cluster to examine. To view details of the cluster, click on the
cluster name or icon. Status and configuration information will appear in
a new window. To view this information within the FailSafe Cluster View window,
select Options. When you then click on the Show Details option, the status
details will appear in the right side of the window. After you have defined a cluster, you can display the nodes in that
cluster with the following command: You can see a list of the clusters that have been defined with the following
command:
Linux FailSafe Administrator's Guide
(document number: 007-4322-002 / published: 2001-02-28)
table of contents | additional info | download
home/search |
what's new |
help
|