We're still experiencing the occasional OSPF adjacency drop, although it's much improved since our changes over the summer.
However, the log entries in the System log is anything but useful:
OSPF adjacency with neighbor has gone down. interface ae2.211, neighbor router ID 10.200.11.96, neighbor IP address 10.200.11.96.
Is there any way to get more detailed logs as to why the adjacency has gone down?
The best place to start would be the routed.log; combining this with the logs of the peer device terminating the OSPF connection.
> less mp-log routed.log
I've seen bugs in the past which cause OSPF hello packets to be caused, causing flapping. What PAN-OS version are you on?
The core firewall that's set as the Designated Router (via priority settings) is an HA pair of PA3020s running 7.1.14.
The school firewalls are a mix of PA200s, PA500s, and PA3020s, running 7.1.19. The school in question has a PA200.
The routed.log gives a bit more information, although the error codes are a little cryptic. 🙂 And don't show up in Google searches.
**** AUDIT 0x3e01 - 91 (0000) **** I:1a6ba9ec F:00000002 qodmnmi.c 215 :at 08:51:18, 22 October 2018 (1499305380 ms) OSPF 5 An adjacency with a neighbor has gone down. Resources associated with database exchange for this neighbor will be freed. Neighbor router ID 10.200.2.70 Neighbor IP address 10.200.2.70 Interface category network interface Interface neighbor IP addr 10.200.2.70 i/f idx 0X00000000 **** AUDIT 0x3e01 - 210 (0000) **** I:1a6ba9ec F:00000002 qoamnfsa.c 754 :at 08:51:18, 22 October 2018 (1499305380 ms) OSPF 5 i/f idx 0X0000010A rtr ID 10.200.2.70 IP addr 10.200.2.70 neighbor FSM state has deteriorated. Interface address = IP addr 10.200.2.1 OSPF link category = 1 Is neighbor virtual? = 0 FSM input = QOAM_NBR_INACTIVITY_TMR (13) Old FSM state = AMB_OSPF_NBR_FULL (8) New FSM state = AMB_OSPF_NBR_DOWN (1) FSM action = I (9) Neighbor friend status = 1 Number of neighbor events = 6 Number of database exchange timeouts = 0
I'm guessing the QOAM_NBR_INACTIVITY_TMR means the dead count timer has expired (meaning it hasn't received any of the 4 HELLO packets that were sent 10 seconds apart)? If this is the case, then we'll need to consider changing the dead count timer/intervals to compensate (tried that this morning by just updated the PA3020s, which knocked the entire district offline due to timer mismatch. Ooops!). 🙂
The link between the school board office where the core PA3020s sit are connected to most schools via a private fibre network (secondary schools) and via Ubiquiti wireless links (elementary school) back to the secondary schools. It's the wireless sites that are having the occasional OSPF drop-off. It's a flat layer-2 bridged network currently.
That's great - and you're definitely right "QOAM_NBR_INACTIVITY_TMR" seems to be indicating that the firewall didn't receive the OSPF hello packets. So either yes, the timer values need to be adjusted, or there is an issue with the OSPF packets reaching the firewall.
No OSPF related bugs in the version you're on up until 7.1.20 - so I don't think it's any buggy behaviour causing this. I would start off with adjusting the timeout values as a starter for one and see how you get on.
Yeah, we're thinking we're going to adjust the dead timer intervals on all the firewalls on our fibre/wireless network over the Christmas break (as it will require taking all sites offline for the time it takes to configure each VR on each firewall).
We'll have to do some more reading on it, but we'll probably go with something along the lines of:
Hello Interval: 5
Dead Counts: 12
Retransmit Interval: (not sure)
Transit Delay: (not sure)
Graceful Restart: 15
That way, the link would have to be really bad for a minute before OSPF drops it from the routing table. Send more frequently, and wait longer before declaring it dead.
We'll live with the occasional OSPF flap until then. It's much improved compared to last school year, with the OSPF changes we made over the summer.
Thanks for the pointers. We'll get these firewalls configured perfectly, just in time for the Ministry of Education to change everything next year. 😄
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!