Issues with Overlay Routing and AWS Gateway Load Balancer

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Please sign in to see details of an important advisory in our Customer Advisories area.

Issues with Overlay Routing and AWS Gateway Load Balancer

Hey Folks,

 

I am having difficulties to get Overlay routing working with AWS GWLB and I was wondering is it something that I am doing wrong or missing some configuration element...

 

Any of you using AWS GWLB with overlay routing enabled?

 

In my test setup when overlay routing is enabled the test VM is able to reach internet over the PAN FW - Outbound is working fine.

But East-West between VPCs and Inbound traffic is not working. I can see traffic hitting the firewall, but allow traffic log show only byte send and no return traffic. Packet capture on the destination (for both east-west and inbound) doesn't show traffic to be arriving, so it looks like once FW inspect the packet and send back to GWLBe it doesn't send it in the correct direction.

If overlay routing is disabled everything works - east-west, inbound and outbound.

 

I found some old discussions mentioning issues with overlay routing, but from what I understand those know issues were for version 10.0.x, while we have tested with 10.2.1 and 10.1.6

 

 

NGFW AWS 

27 REPLIES 27

L1 Bithead

Check the route table that is GWLB endpoint is located, and make sure you have a route back to your internal resources (your VPCs CIDRs)

Hi @Mandanajan,

Thank you for the suggestion, but I doubted the problem is in GWLBe route table. The reason for that is exact same setup (same GWLBe, same route table, everything works for East-West traffic the moment we have disabled route overlay. Also the outbound traffic works over the same GWLBe when overlay is enabled and I believe it wouldn't work if I was missing route for the VPC, right?

Hi @aleksandar.astardzhiev 

 

I have the same problem, if I disabled Overlay then my east/west traffic worked fine, but outbound did not. With overlay on, it's the reverse. I tried 2.1.4, 2.1.6 and 2.1.7 plugins no change. I am also running 10.1.6. I just downgraded to 10.1.5h1 and now it all works, maybe give that a shot.

Hi  @justin.stone ,

That is intersting. At least I know I am not insane...

I think I had tried with 10.1.5...but can't remember if it was .5 or .5h1. Thanks I will give it a try.

L1 Bithead

We also: we have tried version 10.2.1 and downgraded to 10.1.6-h3 on AWS support advise, none work so far in.

 

We have not disable overlay routing yet. But will first try 10.1.5h1 as a suggestion and take a look.

 

BW

L2 Linker

This is due to an existing bug which the team is actively working on.  10.1.5-h5 does not have this issue. 

Thanks,

Nidhi

L1 Bithead

With 1 post to your name to say something is being fixed, and with all due respect: how do you know 'the team' (assume you mean Palo Alto dev) are actively working on it?

 

Can you provide more detail please @npandey 

 

As we need an upgrade path into version 10.2 and beyond, and this bug (that is known) has not been fixed any any releases beyond 10.1.5-h5

L2 Linker

 

I have come across this issue and the reason I got  tagged to this query. I have already raised this issue with the product team and that’s how I am aware about the Dev team looking into it. 

If this is urgent and the customer is ok with NAT gateway, this could be a workaround otherwise we may have to wait for the fix to be officially available. If the customer needs a more official statement, please raise a TAC case.

What was the outcome of downgrading to 10.1.5h1?  I too am running 10.1.6-h3 and I have been banging my **bleep** head over this.  I have a firewall pair in another AWS region running 10.0.7 and this works perfectly but not in 10.1.6-h3 in my AWS region where I need it to work.

FWIW I opened a tac case few weeks ago, and they confirmed that 10.1.6 had issues with gwlb, as well as 10.2.2

Hello @JHall15 

 

We went to 10.1.5-h5 and indeed this is the only Pan-OS revision that works. So basically, if this is in production you have limited options that put your environment at risk due to the out of date firmware.

 

I am opening up a TAC case as we need to add more weight to the issue to get this fixed.

 

L2 Linker

Hello, 

Updating this post - The fix for the issue is committed to 10.1.7 version. 

 

Thanks,

Nidhi 

L0 Member

Just to say this issue has impacted me too after an upgrade to 10.2.2 

It would be good to have a bug ID associate to this with some more information for what the root cause. 

 

so far TAC only mention 10.1.7

We have the fix for the issue integrated in 10.1.7. do you still see the issue ?

  • 10146 Views
  • 27 replies
  • 1 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!