11-16-2021 06:08 AM - edited 11-16-2021 12:46 PM
Hi folks, I'm facing some throughput issues with a site to site vpn between my onprem site (vm-300) and azure (VpnGw1).
- Windows cluster + SQL Always on Availability Groups (async commit)
- 2 nodes on premises (sql01 and sql02)
- 1 node on azure (sql03).
- Link speed 150Mbps
- Latency between on prem and azure: 15ms
Ipsec tunnel is working, running some generic tests (iperf and smb copies) the throughput hits:
on-prem to azure: 80Mbps
azure to on-prem: 150Mbps
The issue is when SQL trying to replicate.
The sql01 is my primary, so it is the one who replicate data to secondaries (sql02 and sql03)
Throughput replication from sql01 to sql02 it's around 2.5Mbps (lan connection)
Throughput replication from sql01 to sql03 it's around 1Mbps. (which goes through the vpn).
- Tunnel MTU to 1400
- Disable Anti replay protect
I did some captures with packet-capture and I could observe high TCP out-of-order and TCP Previous segment not captured.
Hope some one could help me.
11-18-2021 09:12 AM
Then MTU resizing wont help out. I would say set the PAN MTU size on the tunnel to whatever Azure has theirs set to. Sorry I could be much more help.
11-29-2021 05:11 AM
Hello everyone, after some weeks of analyzes and debug finally we solved the problem.
Due different disk sector size on azure VMs, 512 bytes on premises and 4k on azure, we must enable sql trace flag 1800 on on-premises VMs.
After that the sql replication is working like a charm.
Find below KB about this issue.
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!