|
|
Linux » Books » Administrative »
Linux FailSafe Administrator's Guide
(document number: 007-4322-002 / published: 2001-02-28)
table of contents | additional info | download find in page
When you configure the components
of a Linux FailSafe system, you configure various timeout values and monitoring
intervals that determine the application downtown of a highly-available system
when there is a failure. To determine reasonable values to set for your system,
consider the following equation: application downtime = failure detection + time to handle failure + failure recovery Failure detection depends on the type of failure that is detected: When a node goes down, there will be a node failure detection
after the node timeout; this is an HA parameter that you can modify. All failures
that translate into a node failure (such as heartbeat failure and OS failure)
fall into this failure category. Node timeout has a default value of 15 seconds.
For information on modifying the node timeout value, see Section 5.4.4. When there is a resource failure, there is a monitor failure
of a resource. The amount of time this will take is determined by the following: The monitoring interval for the resource type The monitor timeout for the resource type The number of restarts defined for the resource type, if the
restart mode is configured on
For information on setting values for a resource type, see Section 5.5.6.
Reducing these values will result in a shorter failover time, but reducing
these values could lead to significant increase in the Linux FailSafe overhead
on the system performance and could also lead to false failovers. The time to handle a failure is something that the user cannot
control. In general, this should take a few seconds. The failure recovery time is determined by the total time it takes for
Linux FailSafe to perform the following: Execute the failover policy script (approximately five seconds). Run the stop action script for all resources in the resource
group. This is not required for node failure; the failing node will be reset. Run the start action script for all resources in the resource
group
Linux FailSafe Administrator's Guide
(document number: 007-4322-002 / published: 2001-02-28)
table of contents | additional info | download
home/search |
what's new |
help
|
|
|