Av rating:
Total votes: 7
Total comments: 0


Nirmal Sharma
Increasing the Availability of Virtualised Applications and Services
22 October 2009

By using a virtualized clustering computing environment with failover, you can improve server availability without using as many physical computers. A group of independent computers can work together to increase the availability of virtualised applications and services. If one of the cluster nodes fails, another node takes over to provide the service without  disrupting the service. Nirmal Sharma explains the failover process under Hyper-V and how to improve the performance of a failover.

This article explains the internal process behind the Hyper-V Virtual Machine Resource DLL and the functions used to interact with cluster components to improve the failover process for virtual machines.

Most of the article talks about Hyper-V Resource DLL. It doesn’t really show how to cluster Virtual Machines or how to configure Quick Migration in Hyper-V for Virtual Machines. Instead the article focuses more on the Hyper-V Resource DLL for Virtual Machines and the Failover Process for Virtual Machines running on Hyper-V Server.

Terms

Before we move ahead let define some important terms that we will be using.


Cluster Service
The Cluster Service is the main component of the Clustering Software which handles the communication between Resource Monitor and its managers. All the clustering managers run under the Cluster Service.
Resource Monitor
The Resource Monitor is part of the Clustering Software. This runs under the Cluster Service (Clussvc.exe) to handle the communications between the Resource DLL and the Clustering Software.
Resource DLL
The Resource DLL ships with cluster-aware applications. The functions executed by the Clustering Software are supported by the Resource DLL. The main function of the Resource DLL is to report the status of the application resources to the Clustering Software and execute the functions from its library as and when needed.
Cluster Configuration Database
The Cluster Configuration Database is a registry hive that contains the state of the cluster. It is located at HKLM\Cluster at registry.
Resources
A resource is an entity that can provide a service to a client and can be taken offline and brought online by the Clustering Software. A resource must have its associated Resource DLL so that the Resource Monitor can communicate with the resources using this DLL. The Virtual Machines running on Hyper-V can be configured as a Resource in the Cluster. The Resource DLL for Virtual Machines is VMCLUSRES.DLL

Windows Clustering

Microsoft introduced its first version of clustering software in Windows NT 4.0 Enterprise Edition. Microsoft has significantly improved the clustering software in Windows 2000, Windows Server 2003 and Windows Server 2008. There are two types of clustering technologies: Server Cluster (formerly known as MSCS) and Network Load Balancing Cluster (NLB). MSCS or Server Cluster is basically used for High Availability. NLB is of course used to load balance the TCP/IP traffic. The MSCS or Server Cluster capability is also known as Failover Clustering. The support for Virtual Machines running on Hyper-V in a cluster is available only with Failover Clustering.

Virtual Machines and High Availability

Support for Clustering Virtual Machines was introduced in Windows Server 2008 running the Hyper-V Role and has been continued in the versions that followed.

Windows Clustering includes many components such as Cluster Service, Resource Monitors, Node Manager, Membership Manager, Event Log Processor, Failover Manager, and Cluster Database Manager. The whole purpose of Failover clustering is to provide high availability of application resources. Clustering doesn’t get involved in deciding how much CPU and Memory should be utilized by an application.

An application running in the clustering environment must be cluster-aware. A cluster-aware application supports the functions executed by the cluster service or its components as shown in Figure 1.1. There is no way for Cluster Service to know about the availability of resources of an application in the cluster unless the application is cluster-aware. For example, if a node holding the application resources fails, the Cluster Service running on that node must be notified in order to start the failover process for the application’s resources. Cluster Service does this by receiving the responses from the Resource Monitor. The Resource Monitor tracks the Virtual Machines with the help of Resource DLLs provided by Hyper-V Role.

You cannot cluster Virtual Machines running on Virtual Server. The Virtual Machines running on Virtual Server do not provide any Resource DLL which can be used with the clustering software to make them highly available. On the other hand, the Virtual Machines running on Hyper-V are fully cluster-aware Virtual Machines, supporting/responding to all functions executed by the cluster service. The Resource DLL of Hyper-V Virtual Machines, which supports all the functions, is VMCLUSRES.DLL. Hyper-V provides only one DLL for its Virtual Machines in the cluster. There are not any other DLLs provided by the Hyper-V Role. We will discuss that DLL in detail in this article.

Tip:A Resource DLL is a separate application component that is specifically written to support cluster functions (for example, Open, Terminate, Online, Offline, Retry and so on).

The Clustering Software Resource Monitor tracks the Hyper-V Virtual Machines availability through VMCLUSRES.DLL by performing two checks: IsAlive and LooksAlive. Implementing these tests is application specific and hence why cluster-aware applications are expected to provide their resource DLL. The Cluster Server doesn’t need to know about application-specific functions. It just executes the functions provided by the Resource DLLs. Hyper-V implements many other functions in its Resource DLL. The functions are shown in Figure 1.1. These functions are Hyper-V virtual Machine-specific and not related to clustering in any way.

Tip:The two basic checks (IsAlive and LooksAlive) are supported by every Resource DLL or cluster-aware application.

 

FIGURE 1.1–Cluster Components and Hyper-V Cluster Resource DLL.

In Figure 1.1 you can see, the DLL VMCLUSRES.DLL, is installed when the Hyper-V Role is enabled initially. Before you can cluster Virtual Machines running on Hyper-V, you need to install the Failover Clustering Software on Windows Server 2008 or 2008 R 64-bit edition. After installation is completed, you click on “Services and Applications” in Failover Cluster Management and then select the “Virtual Machines” as Cluster resource.

Tip: If you don’t see “Virtual Machines” then try running the following commands. This DLL must be registered before you can cluster Virtual Machines.

Regsvr32.exe /u VMCLUSRES.DLL
Regsvr32.exe VMCLUSRES.DLL


The above command re-registers the VMCLUSRES.DLL with the Failover Clustering Software.

The next DLL is VMCLUSEX.DLL. This DLL works as a proxy between the Cluster Administrator and the Hyper-V Manager. The main function of this DLL is to provide interfaces to configure and control Virtual Machines configuration parameters and screens. If this DLL is missing or corrupted you can’t access Virtual Machines. VMCLUSEX.DLL doesn’t implement any cluster-specific control functions. As an example, when you right click on a Virtual Machine resource using the Failover Cluster Manager, you will get “Bring this Virtual Machine Online” option to start the Virtual Machine. The same will be reflected in Hyper-V Manager. You will see the Virtual Machine starting in the Hyper-V Manager also.

VMMS.EXE which is the main process of Hyper-V needs to know the status of Virtual Machines running on the Hyper-V Server. The Resource DLL is written to update the status of the Virtual Machines in a cluster to VMMS.EXE. VMMS.EXE, in turn, shows the status of each Virtual Machine in Hyper-V Manager.

VMCLUSRES.DLL which sits between Resource Monitor and Virtual Machines plays an important role in the failover process. Without this DLL Hyper-V cannot function as a cluster-aware application.

Tip: A malicious code running in your system may corrupt the DLL files.

  1. Re-run the Hyper-V Setup (disabling and enabling the role)
  2. Copy VMCLUSRES.DLL from a working computer.

Figure 1.1 above, also shows the functions defined in VMCLUSRES.DLL. The Hyper-V Virtual Machine-Specific functions are mapped with the cluster-specific functions. For example, Cluster’s IsAlive and LooksAlive functions are mapped with VM IsAlive and VM LooksAlive respectively. However, there are no static mappings defined within VMCLUSRES.DLL. VMCLUSRES.DLL knows which function to execute. The same way, other Virtual Machines functions are also mapped to related cluster functions as shown in Figure 1.1.

VM IsAlive and VM LooksAlive functions are executed by VMCLUSRES.DLL at a predefined interval. Most of the monitoring task is done by performing a VM IsAlive query. VM IsAlive is implemented in such a way that it performs all the checks for Hyper-V Virtual Machines. It checks to make sure all the:

  • Virtual Machines in cluster are online.
  • Virtual Machines are configured with correct dependencies.
  • The registry entries for Virtual Machines resources are configured correctly.

VM LooksAlive is used to perform a thorough check on the Virtual Machines in the cluster. This check might take some time as it includes checking the configuration of Virtual Machine, Virtual Machine Configuration file location (XML), VHD location, etc. It might take some time for LooksAlive to perform these checks and report back the status to the Resource Monitor. To avoid the delays in reporting, the Resource Monitor cluster component depends on the results reported by IsAlive which is configured to execute every 5 seconds by default. IsAlive only checks the status of Virtual Machine in the Cluster (e.g. Online or Failed). Based upon that, the action is taken by the Resource Monitor. Think of a situation where only LooksAlive is used to get the status of Virtual Machines in the Cluster. This may result in slightly more downtime of the Virtual Machines as LooksAlive calls are executed every 60 seconds! Now, you could ask why not decrease the interval of LooksAlive. Well, if you do so, you would see performance issue on the cluster. Please note that the Resource Monitor component of Clustering Software executes IsAlive and LooksAlive queries against the whole Cluster Group. It is the responsibility of the Resource DLL (VMCLUSRES.DLL) to execute VM IsAlive and VM LooksAlive against its Virtual Machine resources. By default, the IsAlive check is performed every 5 seconds and LooksAlive check is performed every 60 seconds as shown in Figure 1.2 below.

 

FIGURE 1.2: IsAlive and LooksAlive Interval of Virtual Machine Resource

The default interval can be changed per Virtual Machines to improve failover response time as shown above in Figure 1.2.

In previous versions of Windows Clustering, it was not possible to define the IsAlive and LooksAlive interval per Resource. Now, starting with Windows Server 2008 cluster, it is possible to define the IsAlive and LooksAlive intervals per resource.

When you setup a cluster for the first time, the Cluster Service running on the node takes a snapshot of the cluster configuration and saves it in HKLM\Cluster key. This Key contains the cluster configuration such as the resource name, their GUID, node holding the resources and status. This is generally called cluster configuration database. As an example, for Virtual Machines it includes the following:

 

The PersistentState keeps the status of the Resources or Virtual Machines in the Cluster. The above shown Status column is just for your reference. The PersistentState 1 means Online and 0 means Offline. The “Status” column is not stored as a registry entry.

This is also shown in the Cluster Registry hive:

 

FIGURE 1.3: PersistentState Entry in the Cluster Registry for Virtual Machine.

As you can see in Figure 1.3, the PersistentState registry entry value of Virtual Machine “Test Cluster VM” is 1 which indicates that the Virtual Machine is Online in the cluster.

Before the Resource Monitor executes any cluster function against the Virtual Machines or Cluster Groups, it looks at the cluster configuration database to check the status of all resources and their GUIDs. For example, let say we have a cluster group named “HyperV VMs”. All the Virtual Machines of Hyper-V reside in this group. When IsAlive interval expires (5 seconds by default), the Resource Monitor executes the IsAlive call against the “Hyper-V VMs” Cluster Group. It hands over the Resource GUID and Status to the Hyper-V Virtual Machines Resource DLL (VMCLUSRES.DLL). VMCLUSRES.DLL in turn executes the VM IsAlive call to check the Virtual Machines availability. Please note that VMCLUSRES.DLL doesn’t really know about the status of Virtual Machines. It is the Resource Monitor who supplies this information to VMCLUSRES.DLL.

Next we look at VM Open, VM Close, VM Online and VM Offline. These functions are called whenever Virtual Machines are moved across Hyper-V Servers or taken offline/online or when there is the need to call them. For example, you might want to take a Virtual Machine offline for maintenance purposes on a Hyper-V node. In that case, the Resource Monitor executes the Offline function and in turn VMCLUSRES.DLL executes the VM Offline function to take the Virtual Machine offline. The same will be updated to the VMMS.EXE process in background so that it is aware of the Virtual Machine status. We will discuss these functions later in this article. As a whole, these functions are executed by the Cluster Service and supported by the Hyper-V Resource DLL. That’s why Hyper-V Virtualization are known as pure cluster-aware Virtualization Software!

The Resource Monitor determines the state of Virtual Machines by checking the PersistentState value at the registry. This value could be either 1 or 0. 1 is for Online and 0 is for Offline. For example, if you stop a Virtual Machine on a cluster node, the value 0 is set for that service or resource at the registry. If you stop the Virtual Machine using command line or Hyper-V Manager, the value is still updated in the Cluster Configuration Database.  It is because Resource DLL of Hyper-V and VMMS.EXE always talk to each other to get the status of Virtual Machines and update accordingly in the Cluster Configuration Database. When you stop a Virtual Machine using a command line or WMI Script, you are actually interacting with VMMS.EXE service which, in turn, executes the Stop command on behalf of you. The status of Virtual Machine is updated in the Cluster Configuraiton Database. This may not work for other applications in the cluster. As an example, Exchange Server. Operations occurring out of the cluster for Exchange Server resources are not reflected at the cluster configuration database. In this case, the IsAlive query may not function correctly. The value supplied by the resource monitor will indicate that the Resources are running. Thus IsAlive will not take any action against the stopped Cluster Resources. The value is updated in the Cluster Configuration Database only when the LooksAlive is executed which performs a thorough check for the resources. The thorough check includes checking the Exchange Services.

How does Hyper-V Virtual Machine Resource DLL help in the failover process?

The status messages shown above Figure 1.1 are generated through IsAlive calls. When the IsAlive interval expires, the Resource Monitor executes the Cluster IsAlive calls. The Hyper-V Cluster Resource DLL in turn executes VM IsAlive against all Virtual Machine Resources. The messages returned by these calls include one of the following:

  • Online/Offline
  • Online/Offline Pending
  • Failed

The above status messages are passed back to the Resource Monitor. In turn this reports the need to take any action to the Cluster Service.

As shown in Figure 1.1, the Resource Monitor sits between the Hyper-V Resource DLL and the Cluster Service. Any calls made to Hyper-V Virtual Machine Resources have to take place at VMCLUSRES.DLL first. For example, if the Cluster Service needs to check the availability of Hyper-V Virtual Machine resources, it will make a call to the Resource Monitor; in turn this will ask VMCLUSRES.DLL to check the status of the Hyper-V Virtual Machine Resources and report back. If the Resource Monitor doesn’t receive any response from VMCLUSRES.DLL or it cannot detect the Virtual Machine availability, it will pass the status back to Cluster Service. Cluster Service then passes this status message to related Managers as shown in above figure. Managers take the action as per the status passed by lower layer components. The status message could indicate a failure of Virtual Machine resources or could indicate a simple status message. These messages and cluster actions are discussed later in this article with an example.

In addition, if functions executed by the Resource Monitor do not exist in the Resource DLL, the request is simply discarded and no operation is carried out.

Hyper-V Server doesn’t really utilize its own mechanism to failover the Virtual Machines on the surviving node. Instead Resource DLLs are written to “support” the failover process. The following figure shows a simple failover process:

 

FIGURE 1.3 – VMCLUSRES.DLL and Status Messages in Hyper-V Virtual Machines Failover Process

  • After IsAlive interval expires (by default every 5 seconds), Cluster Service asks the Resource Monitor to report the status of Virtual Machines.
  • Resource Monitor checks the status of Virtual Machine Resources in Cluster configuration database (HKLM\Cluster). It provides VMCLUSRES.DLL with the Virtual Machine Resources GUID and their current status (PersistenState.
  • VMCLUSRES.DLL executes its own function (VM IsAlive) after it receives a signal from the Resource Monitor to perform a check on the Virtual Machines. It checks and reports back the status to Resource Monitor. VMCLUSRES.DLL will report the following status messages:

    Online/Offline

    Online/Offline Pending

    Failed/Stopped

  • After the Resource Monitor receives the status, it compares the status messages received from VMCLUSRES.DLL with the one stored in the Cluster configuration database. It then takes the action as per the status reported by the VMCLUSRES.DLL as listed below:

    1. If comparison is successful, no action is taken. For example, status message received in step 2 is “Online” and VM IsAlive query also reports the same status.
    2. If comparison is unsuccessful, the following actions are taken:

      If status message received in step 2 is “Online” and VM IsAlive query reports “Offline”, the Resource Monitor executes an “Online” function. VMCLUSRES.DLL receives this message and executes VM Online function to bring the Virtual Machine online. This status message is also reported to the VMMS.EXE process.

Tip: The Resource Monitor doesn’t take any action for Online/Offline status messages because an Administrator might have stopped the resource for maintenance purposes, but the same should also be reflected in the Cluster configuration database before IsAlive is called. The Resource Monitor only takes action when the comparison is not successful as stated above.

Furthermore, there shouldn’t be any inconsistencies in the Cluster configuration database. If there were any, these wouldn’t last longer than 5 seconds since IsAlive calls always update the status at the Cluster configuration database.

  • The mechanism isn’t really straight forward. There could be one more message returned by VMCLUSRES.DLL that is “Failed”. In this case the Resource Monitor sends a message (Restart) back to VMCLUSRES.DLL to restart the Virtual Machine resource in the cluster. VMCLUSRES.DLL in turn executes the “VM Online” function to bring the failed Virtual Machines online.

Tip: VMCLUSRES.DLL doesn’t actually implement a separate Restart function. Instead it always uses its own implemented VM Online function. If a resource doesn’t come online within the specified interval or after a few attempts, the resource is considered to be failed and then the Failover process starts. The same is notified to the VMMS.EXE as it needs to keep the status of all the Virtual Machines running in the Cluster.

  • After the Virtual Machine resource has failed, the message is passed back to the Resource Monitor. The Cluster Service receives this message from the Resource Monitor and starts the failover process with the help of the Failover Manager. The Failover Manager on each node will communicate with the Failover Manager on another selected cluster node to improve the failover process. Before Failover Manager on the node where the Virtual Machine resource has failed communicates with another Failover Manager, it needs to get the list of nodes available in Cluster. This is where the Node Manager comes into picture. It supplies the list of nodes available in the cluster and the first available node at the top of the list will be selected for failover.
  • Once the list of nodes has been obtained by the source Failover Manager, it will talk to Failover Manager on the target node. The Failover Manager on the target node supplies the list of Virtual Machines Resources along with GUID and PersistentState to Resource Monitor. Since this is a failover process, the Resource Monitor knows what to do next. It lists all the Virtual Machines with its flag (Online or Offline) and instructs the Resource DLL of Hyper-V to execute the VM Online function from its library.
  • The Resource DLL, in turn, executes the VM Online function to bring the resources online on the target node. The same is updated to the VMMS.EXE process of Hyper-V.
  • If the Virtual Machine is started successfully within a few attempts, the failover process doesn’t occur.

Thus if there is no Resource DLL for Hyper-V Virtual Machines, the failover process could take a longer time to move the resources from one node to another surviving node.  Because Hyper-V Resource DLL is competent enough to handle the cluster functions executed by the Clustering Software, it doesn’t need to wait to decide which action to take. As stated above, the cluster-aware functions are mapped with Hyper-V Resource DLL-specific functions, so it is easier for Hyper-V Resource DLL to execute these functions as soon as they are executed from the Resource Monitor.

In figure 1.3 you see VMMS.EXE and Hyper-V Manager. Every function executed by the VMCLUSRES.DLL is also notified to VMMS.EXE. VMMS.EXE, in turn, refreshes the status of its VMs on the Hyper-V Server. This is required in order to know the exact status of a VM running on the Hyper-V Server. As an example, an Administrator could open the Hyper-V Manager to get the status of all the Virtual Machines on the Hyper-V Server. If a Virtual Machine has failed and this is not communicated to VMMS.EXE, then there could be confusion, since the Failover Cluster Manager would report one status and the Hyper-V Manager would report a different status.

Tip: IsAlive is executed every 5 seconds for a Virtual Machine in the cluster. You could decrease this value to 1 or 2 to speed up the failover process.

Conclusion

To summarize, Virtual Machines running on Virtual Server are not cluster-aware because they do not provide any Resource DLL. Virtual Machines running on Hyper-V are cluster-aware because they provide a Resource DLL as they ship along with a cluster Resource DLL.

We saw how the Cluster Service doesn’t talk to VMCLUSRES.DLL directly. In fact, it uses its Resource Monitor. The status messages passed by the Hyper-V Resource DLL are received by the Resource Monitor to perform any appropriate action.

Finally we also saw how the Hyper-V Resource DLL plays an important role for its Virtual Machines in the cluster. Resource DLLs allow Hyper-V Virtual Machines to be fully cluster-aware VMs. The functions executed by the Resource Monitor on behalf of the Cluster Service are supported by the Hyper-V Resource DLL. This makes the failover process faster.



This article has been viewed 1949 times.
Nirmal Sharma

Author profile: Nirmal Sharma

Nirmal is a MCSEx3, MCITP and was awarded Microsoft MVP award in Directory Services four times. He is specialized in Directory Services, Microsoft Clustering, Hyper-V, SQL and Exchange. He has been involved in Microsoft Technologies since 1994 and followed the progression of Microsoft Operating System and software. He is specialised in Microsoft technologies. In his spare time, he likes to help others and share some of his knowledge by writing tips and articles. He can be reached at nirmal_sharma@mvps.org

Search for other articles by Nirmal Sharma

Rate this article:   Avg rating: from a total of 7 votes.


Poor

OK

Good

Great

Must read
 
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.
 





Free exchange ebook

Want a copy of the new Exchange 2010: A Practical Approach? Register now for our SysAdmin newsletter.
Upgrade Exchange 2003 to Exchange 2010 – Part II
 In Jaap's second article on upgrading straight from Exchange Server 2003 to 2010, he explains how to... Read more...

A Deep Dive into Transport Queues - Part 1
 Submission queues? Poison message queues? Johan Veldhuis unlocks the mysteries of MS Exchange's... Read more...

The Active Directory Recycle Bin in Windows Server 2008 R2
 It has always been a curse as well as a blessing that Active Directory has allowed the rapid removal... Read more...

Moving to Office Communications Server 2007 R2 -Part 2
 In the second part of his article on Moving to Office Communications Server 2007 R2, Desmond looks into... Read more...

An Introduction to Messaging Records Management
 There are a number of features in Exchange that can be used in creative ways to solve problems in... Read more...

Using Exchange 2007 for Resource Booking
 The process of booking various resources to go with a meeting room just got a whole lot easier with... Read more...

Managing Exchange 2007 Mailbox Quotas with Windows PowerShell
 The use of PowerShell with Exchange Server 2007 can do a great deal to ease the task of managing... Read more...

Goodbye Exchange ExMerge, Hello Export-Mailbox
 ExMerge was a great way of exporting a mailbox to an Exchange PST file, or for removing all occurences... Read more...

Controlling Email Messages using Exchange's Transport Rules
 Some tasks that should have been easy in previous versions of Exchange just weren't. Now, with... Read more...

Message Hygiene in Exchange Server 2007
 Around four out of every five email messages are spam. Now that the nuisance threatens to engulf what... Read more...

Over 150,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.

Join Simple Talk