Asymmetric routing

nwallette · ‎04-10-2012

Does anyone else have a multi-site network with asymmetric routing? I'm having some issues getting from site to site.

Here's what's going on:

We have two datacenters -- one for the eastern US, the other for the western US. Each datacenter has a PA-2020. Our satellite offices use PA-500s, ASA 5505s, and ASA 5520s. There is an IPSec tunnel from each site to both datacenters.

Our IP scheme is such that western sites use one subnet (including the DC), and the eastern sites use another subnet. Each of those is broken down into smaller subnets for the individual sites. Each site subnet is broken down further for production, DMZ, and local VPN pools.

The problem I'm having is that, for example, traffic from a site on the west is not able to reach resources on the east. Every site can access either datacenter with no trouble at all. Both datacenters can access any site without problems as well. It's only site-to-site where we see problems.

The tunnel routing is such that, from the site, outbound traffic is routed to whichever datacenter is "home" to the destination site. For instance, a site in California would send traffic through its tunnel to the east datacenter to hit a site in Florida. This becomes an issue when the Florida site replies, as it will reply back to the western datacenter to get back to California. As such, the datacenter firewalls will likely not see the entire session.

All of the routing is static right now, and fairly simple. Each DC has routes to the other DC, and to every site's supernet via its local IPSec tunnel. The sites' routing is east supernet to east DC, west supernet to west DC, and the default gateway for local Internet.

I've followed the advice in this article (https://live.paloaltonetworks.com/docs/DOC-1260) to turn off TCP SYN checking. The connection is initially established, but it will frequently time out while loading a web page. ICMP seems to work fine. SSH seems OK too.

To make things even more confusing, we have Blue Coat proxy appliances at each site, and at each datacenter. They do content filtering at the site level, and deduplication between sites and DCs. They're all configured as transparent in-line proxies.

At this point, there are so many variables that I just don't know where to start troubleshooting. I talked to one support rep at PA, who told me that asymmetric routing will cause problems, plain and simple. Then I found the article about the SYN flags, and that helped... but not completely. So, I'm wondering if anyone else is doing anything similar, and if so, is it working for you?

mikand · ‎04-10-2012

Start with creating a drawing and publish it here with ip-ranges attached (you can use faked ip's but so we have something to discuss around).

As I see it you have 3 types of "sites":

* Client-site (x number of them)

* West-site (DC1)

* East-site (DC2)

Each client-site have two IPSEC tunnels, one to DC1 and one to DC2.

The range at client-site is CLIENT/24 (or whatever size you use), range at west-site is DC1/16 and east-site DC2/16 (the size is just an example).

The client-site then have the following routing:

DC1/16 nexthop tunnel1

DC2/16 nexthop tunnel2

0.0.0.0/0 nexthop tunnel1

0.0.0.0/0 nexthop tunnel2

(im not sure if PA supports ECMP (Equal Cost Multipath), if not you can use different metrics so client-sites located to the west use DC1 as primary and DC2 as secondary while client-sites located to the east use DC2 as primary and DC1 as secondary).

Now at DC1 you have the following routing setup (tunnel999 is the site-to-site (DC1-DC2) tunnel):

CLIENT1/24 nexthop tunnel1 metric 1

CLIENT1/24 nexthop tunnel999 metric 2

CLIENT2/24 nexthop tunnel2 metric 1

CLIENT2/24 nexthop tunnel999 metric 2

...

DC2/16 nexthop tunnel999

0.0.0.0/0 nexthop INTERNETFW_DC1

and the opposite at DC2:

CLIENT1/24 nexthop tunnel1 metric 1

CLIENT1/24 nexthop tunnel999 metric 2

CLIENT2/24 nexthop tunnel2 metric 1

CLIENT2/24 nexthop tunnel999 metric 2

...

DC1/16 nexthop tunnel999

0.0.0.0/0 nexthop INTERNETFW_DC2

Downside here is that defgw towards Internet is only available locally. In order for this to work it would be great if you can setup your Bluecoat webproxy as a non-transparent proxy. Meaning that anything that needs/wants to reach Internet must use the inside-ip of your Bluecoat as "forward-proxy".

If you have a loadbalancer such as F5 or similar you can setup a virtual ip-range which is the ip that the client/servers use as forward-proxy and then the loadbalancer at each site will forward the traffic to the Bluecoat which is currently the best option.

The point here is that in your core you will only have RFC1918 addresses and could setup an IDS to set off an alarm in case a non-RFC1918 ip address shows up (or if you have your PA as core let the PA sound this alarm).

If you do so then you wont need these routes at each client-site:

0.0.0.0/0 nexthop tunnel1

0.0.0.0/0 nexthop tunnel2

This way stuff at DC1 should be able to reach DC2 and the other way around along with clients reaching both DC's.

The above can be optimized by using a dynamic routing between DC1 and DC2 in order to avoid routing loops (if CLIENT2/24 disconnects the packets for CLIENT2/24 would with static routing need to burst between the sites until the TTL hits 0) but on the other hand you would then need to rely on that the dynamic routing protocol is always functional. Another risk with dynamic routing is that you could accidently pass all traffic through a poor client-site in case the DC1-DC2 connection breaks.

nwallette · ‎04-11-2012

You've pretty much nailed it so far. The only correction I have to make is that each client site's Internet access is routed out the local firewall, not back to the datacenter. The numbers you used are close enough to real, so let's go with that:

DC1/16 next-hop DC1 tunnel

DC2/16 next-hop DC2 tunnel

0.0.0.0/0 next-hop ISP

The routing at the DCs is similar, but we don't specify subnets for the other DC's clients, so it's like this (using DC1 as an example):

Site1/24 next-hop Site1 tunnel

Site2/24 next-hop Site2 tunnel

...

DC2/16 next-hop DC2 tunnel

0.0.0.0/0 next-hop ISP

When we first put the PA-2020s into production, I tried turning on OSPF, but I didn't have the areas set up quite right, so we did indeed have traffic routing through a client to get to the other DC. Also, the ASA OSPF configuration is more of a headache than I care to deal with. We decided to leave everything static for now and revisit OSPF implementation between just the PAN devices at a later time. (The plan is to replace all the ASAs through attrition anyway.)

I'm not sure I follow your comments on the proxy. We don't have any load balancing, and as I mentioned above, we don't funnel Internet traffic through the DCs (though we're considering it, since all the per-site Blue Coat content filtering licenses are expensive). What makes an explicit proxy a better choice? In the past we've chosen transparent mode to avoid any necessary configuration on the client side -- especially where it might be difficult or impossible to specify a proxy.

I'll sketch up a diagram -- I could stand to have one lying around anyway. Any time I call in to support, I have to walk the poor guy through this mess. It's not difficult, just big.

mikand · ‎04-11-2012

For redundancy I would recommend you to use the DC1-DC2 tunnel as backup route for your clientsites (depending on how fat this pipe is or can be depending on your wallet and prices in your area). This way if CLIENT1-DC1 link fails but CLIENT1-DC2 still functions then CLIENT1 can still reach servers placed at DC1 (would of course need two metric 2 routes at each CLIENT site which I missed).

You could enable QoS in your PAN to make sure that DC1-DC2 traffic is prioritzed over CLIENT-DC1-DC2 traffic.

The proxy thingy was just to avoid having public ip's flowing around in your core. If you use a non-transparent proxy then only RFC1918 ip's (assuming you use private ip addresses such as 10., 172. or 192.) will flow through your core and in case one (or many) clients gets infected with a trojan or whatever its somewhat likely that this trojan/badware will try to reach its command and control and if you are lucky an IDS in the core (or the PAN itself) could then scream if it detects ip addresses which shouldnt exist in your network (looking at dstip).

The tricky part without loadbalancer is how to use the two DC bluecoats. One workaround is to use PAC (Proxy Auto Config) files which the clients would load in order to find out how to reach Internet, see http://en.wikipedia.org/wiki/Proxy_auto-config for more information.

The point of using DC's as gateways to reach Internet is to have fewer licenses, easier configuration (fewer devices to configure regarding url categories etc) but also consolidate logging. If you know that the client sites can only reach Internet through your internet firewalls and proxies at DCs it will be easier to collect these logs and also perform auditing and other operations on the collected logs. Also I assume that you will already have persons for 24/7 standby for the DC's but not necessary for each remote office (client site).

But sure if you let each client-site reach Internet directly then they will be more autonomous in case both your DC's fails.

Also you could use the content filtering in PAN instead of the BC and just use BC for the proxy stuff (like only allow pure-http and pure-https and such) and let the PAN handle AD-integration (userid), Antivirus, SSL-termination, IDP, URL-categories, AppID etc...

nwallette · ‎04-12-2012

OK, here's a basic diagram showing our network layout.

Our DCs are actually based off a /8 netmask. (I guess it's easier now to stick with real values rather than the arbitrary /16 from before.) There are 24-bit subnets carved out for servers, DMZ, transient networks, etc. The sites that call that DC home are within that /8 as well, in their own 24-bit subnets.

So, the supernet "DC1 /8" encapsulates the management, server, and DMZ of that datacenter, AND all the sites which use that DC as home. (This is indicated by DC1-based sites shown as red, and DC2 as blue.)

The red lines show the tunnel interface to DC1; the blue to DC2. Right now, these are set without overlapping routes, so from a site's perspective, all DC1 /8 goes through the DC1 tunnel, and all DC2 /8 goes through the DC2 tunnel. This is why we're dealing with asymmetric routes when communicating between regions -- the return path is necessarily opposite.

Again, Internet is all local (for now), so there is really no public IP space in our core network, except to follow the DGW out to the local ISP. Certainly not over any tunnel link. We currently have no load balancing, as Internet traffic is bound to the site, and applications that are popular enough to need distribution have native means (Exchange front-end servers, SQL clustering, etc.) or manual distribution (Exchange datastores distributed by region).

It's important to realize, in our case, the datacenters are not mirrors of each other. Some applications are hosted out of one or the other, and used by everyone. Other services are duplicated between the two, but not mirrored.

I.E., each DC has its own email, print, and file services, but they are unique and do not have a copy of the others' data or shares (except for DR purposes via backups.) All users go to DC1 for a subset of applications that are homed there, and likewise with DC2. Currently, all VPN access is via DC2.

I don't relish the idea of removing the redundant site tunnels, as that means all western users will have to flow from DC1 to DC2 and back again to reach services on the other coast. While the inter-DC links are large and do not have transfer caps, they can still be saturated.

On a whim, we tested MTU from coast-to-coast and found that we can get to either DC with 1500 byte packets, but not to a far site. This was calculated as 1428. I don't really understand why. All traffic between sites and DCs is over IPSec links, which should be splitting up and reassembling encapsulated packets as necessary to traverse whatever backbone is available, right?

mikand · ‎04-13-2012

I didnt mean that you should remove the tunnels, but rather make it possible for a site to reach lets say DC2 even if its own link to DC2 is currently unavailable. Meaning if Site1A lost its connectivity with DC2 it can still reach servers at DC2 by going through DC1.

IPSEC will, depending on if you use UDP encapsulation or not, add a few bytes (I think its more bytes when using UDP since the full TCP header must be put as payload) to the payload of the packet making the efficient MTU (or rather MSS) be lower for the payload you wish to transmit over the links.

The tricky part here is that network equipment such as firewalls, routers etc will then fragment the packets in order to let them pass through the tunnel - making it lose some performance (instead of sending just one packet it must now send two packets while the 2nd packet is many times just a few bytes). Only device which normally doesnt do this is a proxy who will have one flow on inside (towards clients for example) and another flow on the outside (towards servers for example).

Regarding handling asymmetric routes in PAN you can set both tunnels to the same zone, this way your security rules will not make a difference if the packet came through DC1 link or through the DC2 link.

Unlock your full community experience!

Asymmetric routing

Asymmetric routing

Show your appreciation!