Interfaces lost IPv4 IP

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Interfaces lost IPv4 IP

L3 Networker

Can anyone clarify whether the source of a device or network configuration (i.e. template/stack or local) could have a [negative] effect on the commit process?

 

My client has recently migrated to a pair of 1400 appliances running PAN-OS 11.0.1. A device group commit shortly after the migration led to a major outage where the active appliance lost its IPv4 configuration and a failover did not occur, and TAC have not been able to identify the root cause. The problem was fixed by another commit forcing template values (whether the force or just another commit restored service is not known). Technically this is not a valid configuration as routes rely on next hops in the interface subnets, no configuration was actually 'lost' as the config successfully committed, but we have evidence in routed.log of the unreachable next-hops.

 

Prior to the migration, most of the device and network config on the old hardware had been provisioned locally i.e. not from a Panorama template. Given this process change is one of only 3 items that changed during the migration (the other two being new hardware and software), my customer has asked whether there would be any benefit removing the network config (at minimum IPv4 addresses) from the Panorama template and adding it back into the local device configs?

 

My understanding of the commit process and experience says no, as the resulting config committed to the device should be identical regardless of the source, once the template and local config/overrides are merged and the actual configuration commit starts. That said, is there a technical explanation of the commit process or any evidence we can provide as reassurance of the location of the config having a bearing on the commit process, and therefore a potential contributing factor to the outage?

 

Thanks!

 

Matt

3 REPLIES 3

L3 Networker

Said evidence.

2023-06-19 14:17:08.408 +1000 phase1 started
2023-06-19 14:17:08.411 +1000 auto_mac_detect not configured, set to false, auto_mac_detect=0
2023-06-19 14:17:08.411 +1000 b_auto_mac_detect is set to 0
2023-06-19 14:17:08.412 +1000 Warning:  pan_routed_parse_interface_ip(pan_routed_parse.c:882): interface 'ae1.55' ip was not found
2023-06-19 14:17:08.412 +1000 Warning:  pan_routed_parse_interface_ip(pan_routed_parse.c:882): interface 'ae2.241' ip was not found
2023-06-19 14:17:08.412 +1000 Warning:  pan_routed_parse_interface_ip(pan_routed_parse.c:882): interface 'ae2.246' ip was not found
2023-06-19 14:17:08.412 +1000 Warning:  pan_routed_parse_interface_ip(pan_routed_parse.c:882): interface 'ae2.199' ip was not found
2023-06-19 14:17:08.412 +1000 Warning:  pan_routed_parse_interface_ip(pan_routed_parse.c:882): interface 'ae3.811' ip was not found
2023-06-19 14:17:08.417 +1000 phase1 completed

Hi @mb_equate ,

Based on your explanation my first guess is that IPs were probably configured locally on the firewall while the template were empty.

During device-group commit and push, someone has clicked on "Force template values" which tells the firewall to accept what is configured in the template.

 

I would start by comparing commit revisions - locally on the firewall and Panorama, mainly the revision before and after the device-group commit. This will show the changes that were applied with this commit and I would expect to see the removal of the IPs.

 

"...Technically this is not a valid configuration as routes rely on next hops in the interface subnets..."

This is not entierly try. I cannot test at the moment, but I am alsmost certain that you can configure static route with next-hop IP and interface, without assigning IP to that interface. You defintely will see commit warning - like those you have pasted above, but those are warning and not errors.

 

 

Little background - Device-Groups and Templates have different logic for resolving conflicting config. With device-group (just for contrast) you cannot create two rules (locally and pushed by panorama) with exact same name, you can override address object locally, but not a rule. This is on purpose to ensure the order of rule pushed by Panorama

 

Template on other hand allow without issues to push different values for exact same config that is being set locally. If you have device or network config that is being pushed by panorama and configured locally on the FW, FW will always prefer the local config. The tricky part is that pushing from Panorama wouldn't indicate in any way if the settings from the Template are indeed used or there is local config. The only way to apply Panorama config is to either "force template values" when pushing from Panorama, or clicking on "revert" for each setting locally on the firewall (which will require local commit).

 

PanOS 11.0.1 is fairly new any who know what weird bug you may face, but before chaising bugs I would suggest to eliminate obvious reasons by comparing config revisions around the time of the incident.

Thanks Aleksander, some good tips there.

 

This is a new deployment, and while we used local overrides on the new pair to test L3 using temporary IPs, those overrides were intentionally removed by forcing the template values which included the production IPs, so prior to the incident all configs except management and HA were from the new templates.

 

The device and network config were managed on the previous devices locally, despite using templates, and as we all know forcing template values at that point would not have been wise. The previous deployment was not ideal as some of the local configs/overrides relied on ones from templates, I believe they were not onboarded properly by previous admins. I've seen it all and often aligning customers to best practice at every opportunity, including banners/headers/footers reminding admins to use Panorama.

 

We were able to build new templates by merging the config from both old templates and local device XML in such an order that replicates the effective config on the old devices, but now all in templates (except management and HA). This is the key point the customer wants to know if it could have contributed to the issue, as it is a change from prior state where the IP addresses were defined on the devices themselves (from memory also in templates, but values never forced).

 

We checked the config revisions and audit logs, the only change prior to the outage was to a custom URL category. We can't tell if template values were forced during that commit, but it should not have affected anything as we can trust those values. Also, service was restored by committing and forcing template values. Again we don't know whether forcing template values made any difference but in the situation it only made sense.

 

We also tried to replicate the issue by removing the IP addresses on the same devices in a lab (both from template & local override), it's definitely not a valid config as the commit fails:

can't find interface in 'internal-vr' for next hop 172.29.0.1(Module: routed)
client routed phase 1 failure
Commit failed

 

With all that in mind, I don't see any way that the configuration, or location of it (i.e. template vs local) would have contributed to the incident which was more than likely a result of a bug - being early code. If there's a technical document that describes the Panorama commit process from the perspective of the device (i.e. how the configs are merged, what validations are done where before the reconfiguration starts), we could remove all doubt and rule this out. If we can't rule it out, the customer wants to move the IP addresses to the new devices via local overrides, which slowly erodes best practice.

  • 944 Views
  • 3 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!