In order to use Linux FailSafe, you must understand the concepts in
this section.
A cluster node is a single Linux execution environment.
In other words, a single physical or virtual machine. In current Linux environments
this will always be an individual computer. The term node
is used to indicate this meaning in this guide for brevity, as opposed to
any meaning such as a network node.
A pool is the entire set of nodes having membership
in a group of clusters. The clusters are usually close together and should
always serve a common purpose. A replicated cluster configuration database
is stored on each node in the pool.
A cluster is a collection of one or more nodes
coupled to each other by networks or other similar interconnections. A cluster
belongs to one pool and only one pool. A cluster is identified by a simple
name; this name must be unique within the pool. A particular node may be
a member of only one cluster. All nodes in a cluster are also in the pool;
however, all nodes in the pool are not necessarily in the cluster.
A node membership is the list of nodes in a cluster
on which Linux FailSafe can allocate resource groups.
A process membership
is the list of process instances in a cluster that form a process group. There
can be multiple process groups per node.
A resource is a single physical or logical entity
that provides a service to clients or other resources. For example, a resource
can be a single disk volume, a particular network address, or an application
such as a web server. A resource is generally available for use over time
on two or more nodes in a cluster, although it can only be allocated to one
node at any given time.
Resources are identified by a resource name and a resource type. One
resource can be dependent on one or more other resources; if so, it will not
be able to start (that is, be made available for use) unless the dependent
resources are also started. Dependent resources must be part of the same resource
group and are identified in a resource dependency list.
A resource type is a particular class of resource.
All of the resources in a particular resource type can be handled in the same
way for the purposes of failover. Every resource is an instance of exactly
one resource type.
A resource type is identified by a simple name; this name should be
unique within the cluster. A resource type can be defined for a specific node,
or it can be defined for an entire cluster. A resource type definition for
a specific node overrides a clusterwide resource type definition with the
same name; this allows an individual node to override global settings from
a clusterwide resource type definition.
Like resources, a resource type can be dependent on one or more other
resource types. If such a dependency exists, at least one instance of each
of the dependent resource types must be defined. For example, a resource type
named Netscape_web might have resource type dependencies
on resource types named IP_address and volume. If a resource named web1 is defined with the Netscape_web resource type, then the resource group containing web1 must also contain at least one resource of the type IP_address and one resource of the type volume.
The Linux FailSafe software includes some predefined resource types.
If these types fit the application you want to make highly available, you
can reuse them. If none fit, you can create additional resource types by using
the instructions in the Linux FailSafe Programmer's Guide.
A resource name identifies a specific instance
of a resource type. A resource name must be unique for a given resource type.
A resource group is a collection of interdependent
resources. A resource group is identified by a simple name; this name must
be unique within a cluster. Table 1-1 shows an example
of the resources and their corresponding resource types for a resource group
named WebGroup.
Table 1-1. Example Resource Group
Resource | Resource Type |
|---|
10.10.48.22 | IP_address |
/fs1 | filesystem |
vol1 | volume |
web1 | Netscape_web |
If any individual resource in a resource group becomes unavailable for
its intended use, then the entire resource group is considered unavailable.
Therefore, a resource group is the unit of failover.
Resource groups cannot overlap; that is, two resource groups cannot
contain the same resource.
A resource dependency list is a list of resources
upon which a resource depends. Each resource instance must have resource dependencies
that satisfy its resource type dependencies before it can be added to a resource
group.
A resource type dependency list is a list of
resource types upon which a resource type depends. For example, the filesystem resource type depends upon the volume
resource type, and the Netscape_web resource type depends
upon the filesystem and IP_address resource
types.
For example, suppose a file system instance fs1 is
mounted on volume vol1. Before fs1 can
be added to a resource group, fs1 must be defined to depend
on vol1. Linux FailSafe only knows that a file system instance
must have one volume instance in its dependency list. This requirement is
inferred from the resource type dependency list.
A failover is the process of allocating a resource
group (or application) to another node, according to a failover policy. A
failover may be triggered by the failure of a resource, a change in the node
membership (such as when a node fails or starts), or a manual request by the
administrator.
A failover policy is the method used by Linux
FailSafe to determine the destination node of a failover. A failover policy
consists of the following:
Failover domain
Failover attributes
Failover script
Linux FailSafe uses the failover domain output from a failover script
along with failover attributes to determine on which node a resource group
should reside.
The administrator must configure a failover policy for each resource
group. A failover policy name must be unique within the pool. Linux FailSafe
includes predefined failover policies, but you can define your own failover
algorithms as well.
A failover domain is the ordered list of nodes
on which a given resource group can be allocated. The nodes listed in the
failover domain must be within the same cluster; however, the failover domain
does not have to include every node in the cluster.
The administrator defines the initial failover domain when creating
a failover policy. This list is transformed into a run-time failover domain
by the failover script; Linux FailSafe uses the run-time failover domain along
with failover attributes and the node membership to determine the node on
which a resource group should reside. Linux FailSafe stores the run-time failover
domain and uses it as input to the next failover script invocation. Depending
on the run-time conditions and contents of the failover script, the initial
and run-time failover domains may be identical.
In general, Linux FailSafe allocates a given resource group to the first
node listed in the run-time failover domain that is also in the node membership;
the point at which this allocation takes place is affected by the failover
attributes.
A failover attribute is a string that affects
the allocation of a resource group in a cluster. The administrator must specify
system attributes (such as Auto_Failback or Controlled_Failback), and can optionally supply site-specific attributes.
A failover script is a shell script that generates
a run-time failover domain and returns it to the Linux FailSafe process. The
Linux FailSafe process ha_fsd applies the failover attributes
and then selects the first node in the returned failover domain that is also
in the current node membership.
The following failover scripts are provided with the Linux FailSafe
release:
ordered, which never changes the initial
failover domain. When using this script, the initial and run-time failover
domains are equivalent.
round-robin, which selects the resource
group owner in a round-robin (circular) fashion. This policy can be used for
resource groups that can be run in any node in the cluster.
If these scripts do not meet your needs, you can create a new failover
script using the information in this guide.
The action scripts are the set of scripts that
determine how a resource is started, monitored, and stopped. There must be
a set of action scripts specified for each resource type.
The following is the complete set of action scripts that can be specified
for each resource type:
exclusive, which verifies that a resource
is not already running
start, which starts a resource
stop, which stops a resource
monitor, which monitors a resource
restart, which restarts a resource on the
same server after a monitoring failure occurs
The release includes action scripts for predefined resource types. If
these scripts fit the resource type that you want to make highly available,
you can reuse them by copying them and modifying them as needed. If none fits,
you can create additional action scripts by using the instructions in the Linux FailSafe Programmer's Guide.