Wednesday, July 04, 2012

W2K8 / E2K7 Cluster Comms issue

A few weeks ago, someone called me and asked me to help out with an E2K7 CCR cluster running on W2K8.   Regardless of what they tried, the cluster would not achieve quorum.  It had been working up until a few weeks prior and no one had noticed.  The Failover Cluster Administrator suggested that the file share witness could not be contacted. 

However, the CIFS shares on the FSW were accessible and it was ping-able.  The Cluster.log had some interesting errors in it, but the cause was not immediately obvious.  Here are some of the errors that were occuring when the cluster was trying to achieve quorum:

Network Name <Cluster Name>: Unable to Logon. winError 1326
Error 1326 from ResourceControl for resource Cluster Name.
ResourceControl(NETNAME_GET_VIRTUAL_SERVER_TOKEN) to Cluster Name returned 1326.
File Share Witness <File Share Witness (\\FSWSERVER\CLU-01-MNS)>: Failed to get virtual server token from core NetName resource, error 1326.
File Share Witness <File Share Witness (\\FSWSERVER\CLU-01-MNS)>: Failed to retrieve the virtual server token from the core netname resource with 1326. 
RhsCall::Perform_NativeEH: ERROR_LOGON_FAILURE(1326)' because of 'Resource File Share Witness (\\FSWSERVER\CLU-01-MNS): Open call failed.
rcm::RcmAgent::Online: ERROR_LOGON_FAILURE(1326)' because of 'There is a problem with the resource DLL.'
ERROR_LOGON_FAILURE(1326)' because of 'Failed to bring quorum resource e86bd5ca-7bab-4d1c-b9ac-94ef54acdb03 online, status 1326
Signaled NetftRemoteUnreachable  event, local address 10.1.5.210:003853 remote address  
Signaled NetftRemoteUnreachable  event, local address 10.1.5.210:003853 remote address 10.1.5.206:003853
Signaled NetftRemoteUnreachable  event, local address 10.1.5.210:003853 remote address 10.1.5.206:003853

  Frankly, it looked like that problem was with the FSW until the logs suggested that it was the *other* node of the cluster that was not reachable (the IP 10.1.5.206) via port 3853.

  Upon further investigation (which should have been the first thing I looked for), I found my old enemy lurking in the shadows.  Symantec Endpoint Protection with the Network Access protection features enabled.  I checked the SEP firewall logs on the clustered nodes, but it was not showing any errors.   However, once I disabled the Network Access protection component of SEP, the cluster immediately established quorum.

The Exchange support team was unaware the servers were even running SEP.  Their IT security department had deployed SEP to upgrade from an older version of Symantec Antivirus and had not told anyway.




0 Comments:

Post a Comment

<< Home