Quantcast
Channel: High Availability (Clustering) forum
Viewing all 5648 articles
Browse latest View live

Cluster join failure - [Schannel] Server: auth failed on server with error: 80090326

$
0
0

Same issue on 2 separate clusters (both have 2x Server 2016 nodes) after cluster aware update (which only done first node!) As CUA failed the second node never updated & that is why cluster still works!

Both have failed cluster:

Node '' failed to form a cluster. This was because the witness was not accessible. Please ensure that the witness resource is online and available.

Cluster node '' failed to join the cluster because it could not communicate over the network with any other node in the cluster. Verify network connectivity and configuration of any network firewalls.


There is nothing wrong with witness disk, it is accessible fine, shows in Disk Management as Reserved (exactly as it should)

There are known issues with this update, but none states that it would kill Cluster!

In process of testing on clusterA I removed the Down node & tried to add it, cannot do it!

On clusterB, it is now one node working cluster and the second node cannot join. The cluster will work from either node (if I Shutdown both nodes & power them in different order)

I am able to ping the host names and FQDN's of both nodes and the DNS server from all servers




WSFC broken, please help diagnose

$
0
0

I have a 2016 WSFC with file server role. 2 Nodes in the cluster shared storage. We lost Power to Node2 which died, when bringing it back up it wont join the cluster (shows 'Down' in failover cluster manager). If I shut down the entire cluster completley and start it on Node2 first, Node2 runs the cluster fine but Node1 now wont join the cluster (shows 'Down')

As far as I can tell all connectivity seems fine, I've turned off windows firewall, the network between the two servers is working fine and no firewalls in between the two nodes. Other clusters are running on the same infrastructure.

The only hints in failover cluster manager are that the Network connection for Node2 shows as offline (the network is up and working has the allow traffic and management ticked, can ping, RDP etc.

When I shutdown then restart the entire cluster Node2 first, roles become reversed, Node1 now shows network as offline, information details or crytical events for network have no entries

Crytical Events for Node2 itself, when in down state show: Error 1653 Cluster node 'Node2' failed to join the cluster because it could not communicate over the network with any other node in the cluster. Verify network connectivity and configuration of any network firewalls. - however im not convinvced this is actually the issue because of the below error messages:

The failover clustering log is as follows:

00000774.00001c4c::2018/05/15-16:48:50.659 INFO  [Schannel] Server: Negotiation is done, protocol: 10, security level: Sign00000774.00001c4c::2018/05/15-16:48:50.663 DBG   [Schannel] Server: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 16100000774.00001c4c::2018/05/15-16:48:50.712 DBG   [Schannel] Server: ASC, sec: 90312, buf: 205900000774.00001c4c::2018/05/15-16:48:50.728 DBG   [Schannel] Server: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 199200000774.00001c4c::2018/05/15-16:48:50.730 DBG   [Schannel] Server: ASC, sec: 0, buf: 5100000774.00001c4c::2018/05/15-16:48:50.730 DBG   [Schannel] Server: Receive, type: MSG_AUTH_PACKAGE::Synchronize, buf: 000000774.00001c4c::2018/05/15-16:48:50.730 INFO  [Schannel] Server: Security context exchanged for cluster00000774.00001c4c::2018/05/15-16:48:50.735 DBG   [Schannel] Client: ISC, sec: 90312, buf: 17800000774.00001c4c::2018/05/15-16:48:50.736 DBG   [Schannel] Client: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 6000000774.00001c4c::2018/05/15-16:48:50.736 DBG   [Schannel] Client: ISC, sec: 90312, buf: 21000000774.00001c4c::2018/05/15-16:48:50.749 DBG   [Schannel] Client: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 213300000774.00001c4c::2018/05/15-16:48:50.752 DBG   [Schannel] Client: ISC, sec: 90364, buf: 5800000774.00001c4c::2018/05/15-16:48:50.753 DBG   [Schannel] Client: ISC, sec: 90364, buf: 1400000774.00001c4c::2018/05/15-16:48:50.753 DBG   [Schannel] Client: ISC, sec: 90312, buf: 6100000774.00001c4c::2018/05/15-16:48:50.754 DBG   [Schannel] Client: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 7500000774.00001c4c::2018/05/15-16:48:50.754 DBG   [Schannel] Client: ISC, sec: 0, buf: 000000774.00001c4c::2018/05/15-16:48:50.754 INFO  [Schannel] Client: Security context exchanged for netft00000774.00001c4c::2018/05/15-16:48:50.756 WARN  [ClRtl] Cannot open crypto container (error 2148073494). Giving up.00000774.00001c4c::2018/05/15-16:48:50.756 ERR   mscs_security::SchannelSecurityContext::AuthenticateAndAuthorize: (-2146893802)' because of 'ClRtlRetrieveServiceSecret(&secretBLOB)'00000774.00001c4c::2018/05/15-16:48:50.756 WARN  mscs::ListenerWorker::operator (): HrError(0x80090016)' because of '[SV] Schannel Authentication or Authorization Failed'00000774.00001c4c::2018/05/15-16:48:50.756 DBG   [CHANNEL 172.23.1.15:~56287~] Close().

specifically:

Server: Negotiation is done (aka they talked to eachother?)
[ClRtl] Cannot open crypto container (error 2148073494). Giving up. mscs_security::SchannelSecurityContext::AuthenticateAndAuthorize: (-2146893802)' because of 'ClRtlRetrieveServiceSecret(&secretBLOB)' mscs::ListenerWorker::operator (): HrError(0x80090016)' because of '[SV] Schannel Authentication or Authorization Failed'

I cant find many if any articles dealing with these messages, the only ones I can find, say to make sure permissions are correct on  %SystemRoot%\Users\All Users\Microsoft\Crypto\RSA\MachineKeys 

I did have to change some of the permissions on these files but still couldnt join the cluster. Other than that im struggling to find any actual issues (SMB access from node1 to node2 appears to be fine, smb access from node2 to node1 appears to be fine, dns appears to be working fine, file share whitness seems to be fine)

Finally the cluster vlaidations report shows these two errors as the only errors with the cluster

Validate disk Arbitration: Failed to release SCSI reservation on Test Disk 0 from node Node2.domain: Element not found.

Validate CSV Settings: Failed to validate Server Message Block (SMB) share access through the IP address of the fault tolerant network driver for failover clustering (NetFT). The connection was attempted with the Cluster Shared Volumes test user account, from node Node1.domain to the share on node Node2.domain. The network path was not found.

Validate CSV Settings: Failed to validate Server Message Block (SMB) share access through the IP address of the fault tolerant network driver for failover clustering (NetFT). The connection was attempted with the Cluster Shared Volumes test user account, from node Node2.domain to the share on node Node1.domain. The network path was not found.

other errors from the event logs

ID5398 Cluster failed to start. The latest copy of cluster configuration data was not available within the set of nodes attempting to start the cluster. Changes to the cluster occurred while the set of nodes were not in membership and as a result were not able to receive configuration data updates. .Votes required to start cluster: 2 Votes available: 1Nodes with votes: Node1 Node2 Guidance:Attempt to start the cluster service on all nodes in the cluster so that nodes with the latest copy of the cluster configuration data can first form the cluster. The cluster will be able to start and the nodes will automatically obtain the updated cluster configuration data. If there are no nodes available with the latest copy of the cluster configuration data, run the 'Start-ClusterNode -FQ' Windows PowerShell cmdlet. Using the ForceQuorum (FQ) parameter will start the cluster service and mark this node's copy of the cluster configuration data to be authoritative.  Forcing quorum on a node with an outdated copy of the cluster database may result in cluster configuration changes that occurred while the node was not participating in the cluster to be lost.

ID4350 Cluster API call failed with error code: 0x80070046. Cluster API function: ClusterResourceTypeOpenEnum Arguments: hCluster: 4a398760 lpszResourceTypeName: Distributed Transaction Coordinator lpcchNodeName: 2

Lastly I built another Server node3 to see if I could join it to the cluster but this fails:

* The server 'Node3.domain' could not be added to the cluster. An error occurred while adding node 'Node3.domain' to cluster 'CLUS1'. Keyset does not exist

ive done the steps here with no joy, http://chrishayward.co.uk/2015/07/02/windows-server-2012-r2-add-cluster-node-cluster-service-keyset-does-not-exist/



[Solved] Cluster join failure - [Schannel] Server: auth failed on server with error: 80090326

$
0
0

Same issue on 2 separate clusters (both have 2x Server 2016 nodes) after cluster aware update (which only done first node!) As CUA failed the second node never updated & that is why cluster still works!

Both have failed cluster:

Node '' failed to form a cluster. This was because the witness was not accessible. Please ensure that the witness resource is online and available.

Cluster node '' failed to join the cluster because it could not communicate over the network with any other node in the cluster. Verify network connectivity and configuration of any network firewalls.


There is nothing wrong with witness disk, it is accessible fine, shows in Disk Management as Reserved (exactly as it should)

There are known issues with this update, but none states that it would kill Cluster!

In process of testing on clusterA I removed the Down node & tried to add it, cannot do it!

On clusterB, it is now one node working cluster and the second node cannot join. The cluster will work from either node (if I Shutdown both nodes & power them in different order)

I am able to ping the host names and FQDN's of both nodes and the DNS server from all servers




NAS File Share as File Share Witness for Windows 2012 R2 Stretched Cluster

$
0
0

I've been unable to official documentation stating whether using a NAS file share (in my case, NetApp) as File Share Witness is supported for a Windows 2012 R2 Stretched Cluster. The cluster will support SQL 2012 AlwaysOn AGs. I've found unofficial blog posts from 3rd parties who have successfully configured a non-Windows file share as FSW, but need to know if this Microsoft supports this.

Thanks,

Denis McDowell

 

s2d 4 nodes cluster

$
0
0

dears,

i'm facing a serious issue and im stuck.

my deployment consists of 4 nodes 2019 deplyed in an s2d cluster. Validation is successfull.

all 4 nodes are up, same vswitch created on all the nodes via switch embedded teaming while enabling rdma.

the problem is with one node, each vm on this specific node is not accessible via rdp from client pcs and the application fails.

and the thing is if i move the vm to any of the other nodes it will work and get accessed via rdp. moreover, the host with the issue is accessible via rdp, its just any guest vm is not accessible. And ping is working, no firewall. Everything is working fine.

all users and servers in the same subnet, tracert works. I just don't know what is the issue

any help would be appreciated

best regards

Basic questions on failover clustering

$
0
0

We are running a 4 node File cluster running windows 2012 OS.

And run 6 prod server roles of 6 file servers

Question:

Say for Role FS6, i have 6 storage volumes. And one volume goes offline.

With the logical operator AND used and defined for all the cluster storage volumes of the file server FS6, the File server would go offline and the role FS6 would go in stopped state as shown in figure below:

Questions:

1) If one volume went off and it stopped the role as fig. above. What is expected of the remaining 5 volumes?
   Will they be accessible with the role in stopped state but the volumes show online.

2) A failover will be attempted as per policies on the role and the resources.
   With the polices for the role and resources set as per the following screenshots:

   What is expected?
   - Will a failover be attempted immediately or will it wait to start the resources on the same node for 15 minutes?
   - Will the resources move from current owner node a new node irrespective if the volume comes online or not?
   - Will it attempt to failover and start the resources, and if it fails it will stop trying after 3 attempts?

   - Will there be any disruption is accessing the available volumes during the failover?

Shall appreciate precise answers with above mentioned scenario.

Thanks,
Shailesh

2012 R2 RPC Errors When attempting to add to cluster

$
0
0

Hi,

Fresh build of 2012 R2 on a host, all updates applied and windows firewall is disabled completely. When attempting to add to cluster or even manage another Hyper-V host i get RPC unavailable errors. If i run a tnc IP -port 135 it`s listening but any WMI queries result in the RPC unavailable error.

Can anyone help me :)

NICs SPEED CHANGED AFTER FORMATTED

$
0
0

I have server works on win server 2012r with 4 NIC..
NIC0 = 10G speed
NIC1 = 10G speed
NIC2 = 1G speed
NIC3 = 1G speed
 when i formatted the server with HYPER-V SERVER 2019 .. All NICs are coming with same speed 1G WHY??
If that is normall with hyper-v server then what's about WINDOWS SERVER CORE 2019??

NOTE..

I DON'T TOUCH ANY UNDER INFRASTRUCTURE HARDWARE OR CABLE


window 2019 server error

$
0
0

I had brand new server just installed. But i get the server reboot with Kernel-Power event id 41

and bug check 1001. Please help

No communication on new "cluster only" network - how to troubleshoot?

$
0
0

Hi,

We've had a couple of Server 2016 vms working as File servers 1 and 2. To improve on best practices, I decided to add new vNICs to both VMs that were on a separate VLAN from the "cluster and client" traffic. The VLAN is wide open, with no gateway or DNS (it's only for cluster traffic), so I picked 192.168.11.1 for FS1 and 192.168.11.2 for FS2, both in 255.255.255.0.

The windows firewall rule for inbound and outbound UDP 1812 traffic rule is in place, but it's allow all, on any interface, within the network.

When I run cluster validation, I get:

Network interfaces FS1.ads.ssc.wisc.edu - Ethernet1 and FS2.ads.ssc.wisc.edu - Ethernet1 are on the same cluster network, yet address 192.168.11.2 is not reachable from 192.168.11.1 using UDP on port 3343.

I also notice that the nodes are unable to ping each other this new network.

What sort of troubleshooting can I do to determine why communication isn't happening on the network?

I already ran netstat -rn and see (from FS1 in this case)

Destination         Netmask               Gateway        Interface        Metric
192.168.11.0    255.255.255.0         On-link      192.168.11.1    271

in the routing table, so I think that is right.

I've never done this before, so I'm not sure how to proceed with further testing. Our VM admin has set up two VMs quickly in vSphere that only use this new VLAN, and he confirmed they can communicate with each other. So the problem seems to be with FS1 and FS2.

Disks from other cluster node visible when using get-disk on Windows Server 2016

$
0
0

Hello

For quite som time now, I have been trying to figure out why disks on other cluster nodes are listed when running Get-Disk (PowerShell) on Windows Server 2016 based cluster (15 node Exchange 2016 DAG).

No shared disks in the cluster (no fibre channel or iSCSI). Server "hardware" is VM-ware ESX and each server have their own disks.

The "remote" disks doesn't have a disk number as the local does.

We have a similar Windows Server 2012 (R1) / Exchange 2013 based cluster where this doesn't happen. And even though we have twice the disk count on these, Get-Disk and other disk related cmdLets are performing much faster.

I suspect that new Win 2016 feature is affecting performance very much. So I would like to disable it, if possible.

Does anyone know if that is doable?

Could UseClientAccessNetworksForSharedVolumes cluster setting have something to do with it?. It's 0 on our Windows 2012 cluster and 2 on our Windows 2016 cluster.

Thanks :)


Change File Share Witness Location

$
0
0

Hi, 

Currently we have configured FSW in Windows Server 2008 server, as part of EOL server, planning to migrate from 2008 to 2016, 

My Question is - What is the procedure to change FSW path?

1. Downtime required?

2. Change the path to all three nodes?

3. Any precheck?

Appreciate your assistance on this. 

File Share Witness Path Changes

$
0
0

Hi 

We are planning to change the FSW path on existing three node cluster, is required downtime?, if no what is the impact if i will change the path that time, also do we need to copy any olde config files to new share?

Appreciate your assistance

NLB Manager donot show the other host

$
0
0

Hello

a few months back, I installed and configured Windows NLB on a couple of exchange CAS servers and  everything seems to be working fine (still working fine). However, the other day when I opened NLB manger on each servers to find that one of the HOST is missing from the cluster. If I open NLB on server1, Server2 is not listed and viceversa. I tried to refresh the screen to no avail. I tried to re-add the missing host on to the cluster I get " The specified host is already part of this cluster". NLB is running in multicasrt mode seems to be working fine, i just cant see both servers. Is there any way I can rectify this?

I am running windows 2008R2 Sp1. (it could see it fine in the past not now)

hakim

How I an add the 3rd Host to my existing Hyper-v cluster ?

$
0
0

Existing environment is hyper-v 2 node cluster.

I want to add my 3rd host to my cluster.

Please help with the step by step procedure.



Windows Server 2016 - Failover Cluster failed

$
0
0

Hi, 

I have two Windows server 2016 VMs. Installed the failover cluster feature on both servers. Both servers were fully patched and could ping each other. However when I went to create a cluster on node A, it failed with an error:

https://imgur.com/a/M2KXipm

As soon as this errors occurs, this instantly corrupts network configuration on node B. I can ping node B to A, but can't ping node A to B. Something has gone horribly wrong. The issue I have is that these two VMs and the DC are hosted in Azure. The DC doesn't have DHCP installed, however during the create cluster wizard, it didn't give me the opportunity to assign a static IP to the cluster, instead it states that it will obtain one via DHCP (which doesn't exist). I'm sure this is the root of the problem:

https://imgur.com/a/z2Vc8BI

The only thing I didn't do on the nodes was to enable WMI on the windows firewall, should I blow them away and start over, but with windows firewall disabled as a test, or can this situation be recovered?

Thanks,



FCM sluggish, some VMs changing state rapidly

$
0
0

I have a Hyper-V Failover Cluster where Failover Cluster Manager (FCM) is behaving very strangely.

The FCM GUI shows several VMs changing state rapidly (from Running -> paused -> resume -> running in a very rapid cycle).  When I say rapidly, its happening so fast the right-click menus are flickering.  FCM responsiveness is also sluggish.

These same VMs are shown as running fine and not changing state  in Hyperv Manager.

Does anyone have any suggestions as to what to look for? I'm assuming it's some communication issue.

Error applying Replication Configuration Windows Server 2019 Hyper-V Replica Broker

$
0
0

Hello,

Recently we started replacing our Windows Server 2016 Hyper-V Clusters for Server 2019. On each cluster we have a Hyper-V Replica broker that allows replication from any authenticated server and stores the Replica Files to a default location of one of the Cluster Shared Volumes.

With WS2019 we run into the issue where we get an error applying the Replication Configuration settings. The error is as follows:
Error applying Replication Configuration changes. Unable to open specified location for replication storage. Failed to add authorization entry. Unable to open specified location to store Replica files 'C:\ClusterStorage\volume1\'. Error: 0x80070057 (One or more arguments are invalid).

When we target the default location to a CSV where the owner node is the same as the owner node for the Broker role we don't get this error. However I don't expect this to work in production (moving roles to other nodes).

Did anyone ran into the same issue, and what might be a solution for this? Did anything changed between WS2016 & WS2019 what might cause this?

Kind regards,

Malcolm

Problem to create a windows failover cluster in Windows Server 2016

$
0
0
Hi everyone, I have two servers joined to a domain and I need to install a sql server database failover cluster. Both servers have already installed the figure of failover clustering and the network cards are grouped in NIC Teaming with a Teaming Mode "Switch Independent" and Load Balancing "Dynamic". The problem is that when I want to create the cluster using the wizard, I add the first node without problems, however, when adding the second node I have the following error: "The node cannot be contacted. Ensure that node is powered on and is connected to the network ". Am I missing something? Please I need help about it.

Could two server create a cluster for hyperV failover

$
0
0

Hello,

I have already deploy two windows 2016 standard server for hyperV and File Sharing purpose. Need I deploy the third server or install failover clustering service on DC.

Another question now I use ISCSI to map san storage. I would like to know I should deploy hyperV virtual fibre channel SANs ?

I find it is little strange for me .

Viewing all 5648 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>