09-20-2017 06:41 PM
We have three PA firewall ASG's running in our environment. Everything was working fine since 23rd of August. On 23rd of August two of the firewalls have gone down and as a part of predefined auto scaling policy, two new PA ASG/firewalls spun up with issues. The bootstrap process for both of them have issues as they have not been updated with 'NAT-commit-success" under tags in AWS. Can anyone confirm what the issue may be? How can we confirm if the bootstrap is successful and where can we troubelshoot the issue? P.s we are able to login to the console of both the firewalls but they are still out of service under ELB which is very strange . Please advise
09-20-2017 07:01 PM
This might be caused by a change in the bootstrap config files, change in S3 permissions, change in boostrap file locations, etc. If you can connect to the firewall management interface via HTTPs using the credentials in the boostrap file, then bootstrapping has succeeded. If you cannot, bootstrap may have failed for one of the reasons above. Sometimes the AWS console screenshot will shot bootstrapping success of failure.
This might be difficult to troubleshoot here in this forum. As this solution is a TAC supported integration, I recommend you open a case to get troubleshooting asssitance.
09-20-2017 07:51 PM
Thanks for the reply Warby!
I am able to https into the firewalls and I can also see the 'AWS instance screenshot' showing bootstraping is successful.
However these two PA EC2 instances are still out of service under ELB with only one PA firewall serving the traffic in prod is a huge risk!
The only difference I can see in all three of them is the EC2 instance tag.. I can 'NAT-commit-success' tag updated for the working PA firewall EC2 instance and same tag is missing for the two out of service PA firewalls. I have even tried to deregister them and register again with ELB but no luck ! We have a same setup with exact configuration in preprod but somehow PA firewalls behave strangely and flaky.
I have also opened a case with support Case#: 00746231 but it took them four hours to reply with this
' We are trying to find a right resource to work on this case who is proficient in AWS deployment '
This is ridiculous as this is impacting our production system.
Any other troubleshooting tips Please??
09-20-2017 08:09 PM
Also , I checked the cloud watch logs for one of the faulty ASG lambda configurations and found out these errors
[ERROR] 2017-09-21T03:03:14.743Z 6891dac9-9e79-11e7-a3a9-150d852f6372 [ERROR]: Got an error for the command: https://172.27.21.18/api/?type=op&cmd=<show><jobs><id>1</id></jobs></show>&key=LUFRPT14MW5xOEo1R09KV...
Are they related ?Any suggestions?
09-20-2017 10:42 PM - edited 09-20-2017 10:43 PM
That command is looking for a specific job id (id 1) and that is usually the auto commit job.
Can you log into the firewall and check the status of all jobs?
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!