|
In its simplest form, a cluster is two or more computers that work together to provide computational resources. The idea behind high performance clusters is to join the computing resources of the nodes involved to provide higher scalability and greater combined computing power. So, rather than a simple client making requests of one or more servers, clusters utilize multiple machines to provide a more powerful computing environment through a single system image.
High Performance Computing (HPC) clusters are designed to use parallel computing to apply more processor power to the solution of a problem. HPC clusters typically have a large number of computers (often called nodes) and clusters of hundreds of nodes are not uncommon. The goal is to provide the image of a single system by managing, operating, and coordinating a large number of discrete computers. There are many examples of scientific and statistical computing using multiple low-cost processors in parallel to perform large numbers of operations. This is referred to as parallel computing or parallelism. Thomas Sterling, in his paper entitled How to Build a Beowulf, stated “Parallelism is the ability of many independent threads of control to make progress simultaneously toward the completion of a task.
It is important to note at this point that only parallel applications would be able to realize the full benefit of a HPC cluster. The reason for this is that if each part of the application computational procedure needs to be run in series (i.e. one after the other), there would be not much benefit as one node would have to wait for another to complete it’s task. However, if the application can be divided into parts where it is able to run on different machines at the same time, it would see large benefits from a HPC solution.
The design of high performance clusters is a challenging process.There are many complexities that are introduced by clusters, such as operational considerations of dealing with a potentially large number of computers as a single resource. Clusters of computers should be somewhat self-aware, that is, the work being done on a specific node often must be coordinated with the work being done on other nodes. This can result in complex connectivity configurations and sophisticated inter-process communications between the nodes of a cluster.In addition,the sharing of data between the nodes of a cluster through a common file system is almost always a requirement.
OSS CLUSTER BENEFITS
-
Improve Parallel Application Compute Times
As organizations grow, they often find that the infrastructure that they have does not meet the performance requirements anymore. They need to look for ways to improve application compute times so that information can be obtained quicker to serve the needs of the users. The traditional way of addressing this problem was to scale the hardware vertically, i.e. adding more processing power to a single server or purchasing a larger server. This often leads to large hardware purchases as larger, higher performing servers are typically exponentially more expensive. HPC clusters on the other hand, make use of relatively cheaper hardware but scaling it horizontally to harness the compute power.
-
Improved Service Levels
A more efficient infrastructure is generally more effective. High Performance Computing solutions also enable the applications that drive the integrated enterprise to deliver increased data access, higher levels of availability, and faster response times to end users. This happens because the nodes in the cluster are used in optimised way, with better management and with current updates to technologies used.
OSS CLUSTER COMPONENTS
-
Cluster Hardware
The choice of hardware would depend very much on the application that would run upon it. However, Intel based clusters have proven to give the best price/performance value in many occasions and should be considered by the agency. The servers can be connected together over a high performance network like fast Ethernet, Gigabit Ethernet or Myrinet to act as cluster of compute nodes.
Terminal servers can be used to get the remote console of each node on to common terminal while storage devices based on Storage Area Networks (SAN) or Network Attached Storage (NAS) can be used for the high data storage needs of the cluster. Also, many new hardware technologies exist today to reduce the complexity of the entire cluster which makes troubleshooting easier. Daisy chaining and light path diagnostics, for example, are new technologies that help in reductions of cables required for management of nodes and identifying hardware problems easily.
-
Cluster Software and Applications
Linux has become the choice of operating system for most High performance computing systems since it runs on low cost Intel servers and there are a wide variety of open source applications, compilers and libraries available to build a complete high performance parallel computing platform. Open source tools and applications like MPI, Maui PBS, MPICH, openSSH, OpenSSL, PBS, PVM etc can be used to for requirements like job scheduling, job distribution, message passing, compiling of applications etc of a high performance computer.
Good cluster management software should also be used for cluster installation and management. These are software which helps system administrators in doing a parallel installation of Linux on all the compute nodes at once from a common server. The ability to monitor the activities of all the nodes and hardware control (power on and off) of nodes from central console makes administration a very easy task. Cluster management software should provide system administrators the ability to add on or plug in custom command or scripts in order to make it more flexible and gain more control over cluster functionality of cluster.
OSS CLUSTER SUCCESS FACTORS
-
Minimize licensing costs:
Licensing cost for commercial software in a cluster can potentially be very high if it is based on a per-node basis. Commercial operating systems are an example of this. Therefore, the use of OSS alternatives, like Linux, would significantly reduce overall cost.
-
Minimize operational costs:
Hardware should be chosen such as to minimize the foot print of the entire solution, this will help save on the space and power cost of the system. If the cluster occupies large space then its operational cost will go high.
-
Proper datacenter planning:
A proper assessment of the datacenter should be carried out to ensure that it meets the weight and heat dissipation requirements of the proposed cluster. Necessary measures should be taken should these be insufficient.
-
Detailed technical study:
A detailed study should be performed before finalizing on the OSS solution hardware and software. This should ensure that the proposed OSS alternative provides either the same or more functionality than its commercial counterpart. Additionally, existing agency applications would have no problems integrating to it.
-
Implement cluster management solutions:
The use of technologies such as cable chaining and light path diagnostics will reduce the cost of system administration by allowing system administrators easier access and simplified fault finding functionality on the servers. Additionally, a cluster management software should be deployed to provide some essential administration features like remote and parallel operating system installation, remote hardware control and remote monitoring.
|