Package manager upgrade failures to certain sites

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Package manager upgrade failures to certain sites

L1 Bithead

Hello,

We have a case open for this that has been turned over to internal dev but I thought I would post this here to see if anyone else was experiencing this issue.  This on a 5260 running 10.2.7-h3.

 

We been tracking down this issue for a long time.   It started as Mac build servers that reach out to the Internet using brew to install package upgrades.   These would fail periodically but because it was working more than not it wasn't reported at first.  The problem was eventually reported and we had both our network team and platform team start looking at it.  We conducted numerous packet captures and also worked with our network hardware vendor to make sure the problem wasn't the network itself.   When we started focusing on the Palo Alto we noticed missing return packets in the Palo Alto captures on the box.   Palo Alto thought that was because the traffic was being offloaded to hardware.  So in an effort to find the missing packets we did a session where we turned off hardware offloading.   Unexpectedly the package upgrades with brew no longer failed.   So it appears there is some kind of issue with the traffic being offloaded to hardware but not conclusive yet.   We did notice we do not have the same issue on a 3260 running the same code train.

Since then we have seen other 443 issues with other package managers.  We also wondering if there are other 443 issues just not being reported because they work the majority of the time.

3 REPLIES 3

L1 Bithead

Interestingly it looks like we solved our issue this evening.  We wanted to try and upgrade the firewalls this evening to 10.2.9-h11. Before upgrading I wanted to reload the firewall first and test again on the off chance a reload fixed it (we had failed the firewalls over before but never reloaded). After reloading the passive and then making it active the problem no longer occurs. If we fail over to the one that hasn't been reloaded the problem comes back. Seems to indicate there is some kind of memory leak or some kind of uptime issue with this code train. 

L1 Bithead

Unfortunately the solution was short lived.  Within a day the problem started happening again.  Last night we moved forward with upgrading the firewalls to 10.2.9-h11.   The problem still persists as of testing this morning.

L1 Bithead

Update:

The update to 10.2.9-h11 actually introduced a new problem where all TLS traffic stopped working after 6 days.  This is a known issue and they are releasing a hotfix for it.  We ended up rolling back to the a previous version. 

 

Solution:

PA engineers were able to give us a fix for the original issue we were experiencing. 

The solution was to change the LAG flow from type tag to type tuple:

 

set session lag-flow-key-type tuple

show session lag-flow-key-type

 

After setting it to tuple, initial testing shows we are no longer seeing the issue.   This appears to keep each unique session on a particular link in a LAG.

  • 508 Views
  • 3 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!