This was a particularly odd issue which I had never experienced before so I thought it’s worth blogging about it. During a normal MS Failover Cluster failover operation, the node calming the cluster roles sends out a GARP request to notify the networking infrastructure of the MAC address change. The Layer 3 switch / router then updates the MAC address in the ARP table and packets are routed to the node which claimed the cluster roles. Recently I found myself troubleshooting a MS Failover cluster deployment which wasn’t behaving quite in this manner. Some background info: For the sake of this blog post lets call the 2 nodes A and B. The nodes are running Server 2016, SQL 2012 and Microsoft Failover Cluster services. Each node has 2 NICs, one for the client and management network, and one for the heartbeat network. The cluster consists of 3 Network resource; a cluster IP address and 2 SQL instance addresses which float between the 2 nodes depending on which one is active. All 3 IP addresses are in the same VLAN. Running continues ping to all 3 IP addresses during failover tests. So when the active node was A and I failed over the cluster roles to node B, the failover process was completing successfully however the 3 cluster IP addresses would stop pinging and wouldn’t start pinging again until an hour or so… if I reverted back to node A instantly, the pinging would start again. Looking at the ARP table on the Layer 3 switch I realized that the MAC address associated with the cluster IP addresses wasn’t changing to the MAC address of node B – which is what we would expect as a result of the failover operation. How to To enable “GARP reply” on Palo Alto
... View more