High Availability Clustering

High availability clustering is a mode of clustering that deals with ensuring application availability in mission critical computing environments. A high availability cluster consists of two or more computing nodes, usually sharing a common data store, tasked with ensuring the continuous availability of critical applications such as databases or accounting / business management systems in the event of single or multiple hardware and software resource failure.

In a high availability cluster (HAC) any applications that need to be kept available are controlled by a resource manager. The resource manager takes care of configuration, startup, shutdown, monitoring and migration of any any application or service that needs to be kept available and accessible. The resource manager also takes care of communication channels such as network configuration, IP address allocation and migration between nodes within the HAC.

Applying HAC

HAC concepts can be applied to numerous application areas including:

  • Web & FTP servers
  • File servers
  • ERP & CRM systems
  • Databases
  • Email servers

To share or not to share

All high availability clustering (HAC) requires that the data which is being served be available to all nodes that participate in the cluster. Up until recently, this meant a shared storage solution implemented with expensive external RAID arrays over a shared SCSI bus or using a fibre channel fabric. A new, and low cost, approach now is to use block level replication between the nodes that participate in the cluster. This is known as the shared nothing approach.

Advantages of shared storage HAC

  • Simplified volume management
  • High performance
  • Ease of migration
  • Greater options in RAID level allocation

Advantages of shared nothing HAC

  • Low acquisition costs
  • Lower technical knowledge entry barrier
  • Common and familiar component and technology set
  • Better for disaster recovery scenarios

Block level replication with DRBD

The distributed replicated block device (DRBD) is a software stack which includes a kernel module and system daemon and management utilities that allow two nodes in a cluster to replicate individual disk blocks between one another. This block replication takes place over a dedicated standard TCP/IP network between the cluster nodes, typically made up of 1Gbit Ethenet links. The replication can happen both synchronously or asynchronously depending on the desired performance metrics and disparity between the nodes that make up the cluster.

As the replication takes place over stanard TCP/IP, the nodes in the cluster can be situated in different racks, rooms, cities or countries. This allows you to deploy a disaster recovery solution at very low cost.

More Information

For further details regarding high availability clustering, or for assistance with design and configuration of a cluster for your project, please email sales@xinit.com or use the on-line contact form to register your details.