NAT commit failure in AWS

TCS_Cloud · ‎09-20-2017

Hi ,

We have three PA firewall ASG's running in our environment. Everything was working fine since 23rd of August. On 23rd of August two of the firewalls have gone down and as a part of predefined auto scaling policy, two new PA ASG/firewalls spun up with issues. The bootstrap process for both of them have issues as they have not been updated with 'NAT-commit-success" under tags in AWS. Can anyone confirm what the issue may be? How can we confirm if the bootstrap is successful and where can we troubelshoot the issue? P.s we are able to login to the console of both the firewalls but they are still out of service under ELB which is very strange . Please advise

Cheers,

Omar

Warby · ‎09-20-2017

Hi Omar,

This might be caused by a change in the bootstrap config files, change in S3 permissions, change in boostrap file locations, etc. If you can connect to the firewall management interface via HTTPs using the credentials in the boostrap file, then bootstrapping has succeeded. If you cannot, bootstrap may have failed for one of the reasons above. Sometimes the AWS console screenshot will shot bootstrapping success of failure.

This might be difficult to troubleshoot here in this forum. As this solution is a TAC supported integration, I recommend you open a case to get troubleshooting asssitance.

HTH,

Warby

TCS_Cloud · ‎09-20-2017

Thanks for the reply Warby!

I am able to https into the firewalls and I can also see the 'AWS instance screenshot' showing bootstraping is successful.

However these two PA EC2 instances are still out of service under ELB with only one PA firewall serving the traffic in prod is a huge risk!

The only difference I can see in all three of them is the EC2 instance tag.. I can 'NAT-commit-success' tag updated for the working PA firewall EC2 instance and same tag is missing for the two out of service PA firewalls. I have even tried to deregister them and register again with ELB but no luck ! We have a same setup with exact configuration in preprod but somehow PA firewalls behave strangely and flaky.

I have also opened a case with support Case#: 00746231 but it took them four hours to reply with this

' We are trying to find a right resource to work on this case who is proficient in AWS deployment '

This is ridiculous as this is impacting our production system.

Any other troubleshooting tips Please??

Cheers,

Omar

TCS_Cloud · ‎09-20-2017

Also , I checked the cloud watch logs for one of the faulty ASG lambda configurations and found out these errors

[ERROR] 2017-09-21T03:03:14.743Z 6891dac9-9e79-11e7-a3a9-150d852f6372 [ERROR]: Got an error for the command: https://172.27.21.18/api/?type=op&cmd=<show><jobs><id>1</id></jobs></show>&key=LUFRPT14MW5xOEo1R09KV...

Are they related ?Any suggestions?

Regards

niyengar · ‎09-20-2017

That command is looking for a specific job id (id 1) and that is usually the auto commit job.

Can you log into the firewall and check the status of all jobs?

niyengar · ‎09-20-2017

Oh...and as Warby suggested, please do open a TAC ticket o you can get the proper and timerly support you need.

Unlock your full community experience!

NAT commit failure in AWS

NAT commit failure in AWS

Show your appreciation!