Physical port is taken out of aggregate ethernet interface run in LACP auto mode

Printer Friendly Page

PAN-OS 6.1+

 

Issue

The system log on the Palo Alto Networks firewall generated a message that says one of the physical ports assigned to a given Aggregate Ethernet (AE) interface was taken out of the AE group and then brought back after a minute.

 

2015/03/08 19:55:44 critical lacp    ethern nego-fa 0  LACP interface ethernet1/2 moved out of AE-group ae1. Selection state Selected
2015/03/08 19:55:45 critical lacp    ethern lacp-up 0  LACP interface ethernet1/2 moved into AE-group ae1.

 

Cause

The aggregate interface has auto LACP enabled, which means that LACPDU messages are exchanged with a peer to dynamically negotiate LACP parameters and establish and maintain the AE interface status. LACPDU messages are sent out of every physical interface member of any given AE group.

 

LACP feature has 3 main state machines: Selection, MUX, and RX machine.

 

The RX machine examines data in the received LACPDUs and updates the peer’s state. If no LACPDU messages have been received by the peer device during the past 3 intervals the RX state machine for the given interface goes from CURRENT (operational) to EXPIRED (non-operational) status. This activity appears in the System log as an interface taken out of the AE group.

 

The firewall has a dedicated daemon on MP plane for LACP protocol called “l2ctrld.” Logs generated are stored in l2ctrld.log file in the var/log/pan folder. In the var/log/pan/ l2ctrld.log file you can see the following entries:

 

2015-03-08 19:55:44.766 -0400 ethernet1/2 idx 17, current_while expired.
2015-03-08 19:55:44.766 -0400 ethernet1/2 idx 17, rx state change CURRENT=>EXPIRED

2015-03-08 19:55:44.767 -0400 ethernet1/2 idx 17, mux state change RX_TX=>ATTACHED
2015-03-08 19:55:44.767 -0400 post LACP event to DP: if_idx 17, up 0
2015-03-08 19:55:44.767 -0400 log ethernet1/2 idx 17 leaves lag. sel state Selected

2015-03-08 19:55:45.017 -0400 ethernet1/2 idx 17, mux state change ATTACHED=>RX_TX
2015-03-08 19:55:45.017 -0400 ethernet1/2 idx 17, mux state in RX_TX
2015-03-08 19:55:45.017 -0400 post LACP event to DP: if_idx 17, up 1
2015-03-08 19:55:45.017 -0400 log ethernet1/2 idx 17 join lag

Comments

Hi All,

 

How do I access l2ctrld.log file? Do I need to get the tech support file? or through API?

 

Regards,

Raghav

Hi Raghav

 

you can either extract a techsupport file which will retain directory structure, but may be a little 'overkill' to simply look at a log file

 

you can read log files directly from the CLI:

 

admin@myNGFW> less mp-log l2ctrld.log 

mp-log are management plane logfiles, dp-log are dataplane log files

Thanks Reaper, this helped :)

 

Raghav

Hi,  

 

I see that you mentioned the symptoms. But do you know what can cause this issue? 

If the FW cannot see the response msgs from its peer then it needs to take the interface out of AE group, and I assume this is the expected behavior?

Do you think the following Warning can be the issue?

"The ethernet interface is running with a non-qualified SFP"

 

I appreciate the help,

 

-Shahab 

Hi @smansouri

 

the cause is listed under the 'cause' header: this is an action taken at the software layer where an interface is removed because it is not getting expected informtion from it's remote peer (in this case, the remote interface is the cause of the issue

 

Your issue is regarding a local interface and could cause the remote end to experience the above issue, you should first fix the SFP issue 

Thanks alot for the help!

 

-Shahab

Our LACP is generally working OK, but sometimes we see that the interfaces are flapping, as described in above article.

The problem is intermittent, happens a few times a day. There are 3 AE groups configured for lacp, 2 of these with slow, one with fast transmission rate. Fast failover and pre-neg is disabled. The other end: Cisco switch.

 

In the Palo log we can see following (after debug turned on):

 

 

 

2017-10-16 09:16:24.284 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 18 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:26.766 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 19 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:26.766 +0200 Warning:  pan_lacp_worker_poll(pan_lacp_thread.c:626): lacp worker poll timer interval too long. now 553598288, last 553595807, diff 2481
2017-10-16 09:16:26.766 +0200 Warning:  pan_lacp_worker_poll(pan_lacp_thread.c:622): lacp worker poll timer interval too short. now 553598289, last 553598288, diff 1
2017-10-16 09:16:26.766 +0200 Warning:  pan_lacp_worker_poll(pan_lacp_thread.c:622): lacp worker poll timer interval too short. now 553598289, last 553598289, diff 0
2017-10-16 09:16:26.766 +0200 Warning:  pan_lacp_one_sec_timer(pan_lacp.c:1581): lacp one-second timer interval too long. now 553598289, last 553595806, diff 2483
2017-10-16 09:16:26.799 +0200 Warning:  pan_lacp_worker_poll(pan_lacp_thread.c:622): lacp worker poll timer interval too short. now 553598322, last 553598289, diff 33
2017-10-16 09:16:26.799 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 18 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:26.799 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 19 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:26.799 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 25 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:26.799 +0200 Warning:  pan_lacp_worker_poll(pan_lacp_thread.c:622): lacp worker poll timer interval too short. now 553598322, last 553598322, diff 0
2017-10-16 09:16:26.799 +0200 Warning:  pan_lacp_worker_poll(pan_lacp_thread.c:622): lacp worker poll timer interval too short. now 553598322, last 553598322, diff 0
2017-10-16 09:16:26.800 +0200 Warning:  pan_lacp_worker_poll(pan_lacp_thread.c:622): lacp worker poll timer interval too short. now 553598322, last 553598322, diff 0
2017-10-16 09:16:26.800 +0200 Warning:  pan_lacp_one_sec_timer(pan_lacp.c:1577): lacp one-second timer interval too short. now 553598322, last 553598289, diff 33
2017-10-16 09:16:26.800 +0200 ethernet1/3 idx 18, current_while expired.
2017-10-16 09:16:26.800 +0200 ethernet1/3 idx 18, rx state change CURRENT=>EXPIRED
2017-10-16 09:16:26.800 +0200 ethernet1/4 idx 19, current_while expired.
2017-10-16 09:16:26.800 +0200 ethernet1/4 idx 19, rx state change CURRENT=>EXPIRED
2017-10-16 09:16:26.800 +0200 Warning:  pan_lacp_worker_poll(pan_lacp_thread.c:622): lacp worker poll timer interval too short. now 553598322, last 553598322, diff 0
2017-10-16 09:16:26.800 +0200 ethernet1/3 idx 18 mux state change RX_TX=>ATTACHED, select_state Selected, partner state 0x37
2017-10-16 09:16:26.800 +0200 post LACP event to DP: if_idx 18, up 0
2017-10-16 09:16:26.800 +0200 log ethernet1/3 idx 18 leaves lag. sel state Selected
2017-10-16 09:16:26.800 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 18 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:26.800 +0200 ethernet1/4 idx 19 mux state change RX_TX=>ATTACHED, select_state Selected, partner state 0x37
2017-10-16 09:16:26.800 +0200 post LACP event to DP: if_idx 19, up 0
2017-10-16 09:16:26.800 +0200 log ethernet1/4 idx 19 leaves lag. sel state Selected
2017-10-16 09:16:26.800 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 19 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:26.800 +0200 Warning:  pan_lacp_worker_poll(pan_lacp_thread.c:622): lacp worker poll timer interval too short. now 553598323, last 553598322, diff 1
2017-10-16 09:16:26.800 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 18 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:26.800 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 19 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:26.800 +0200 Warning:  pan_lacp_worker_poll(pan_lacp_thread.c:622): lacp worker poll timer interval too short. now 553598323, last 553598323, diff 0
2017-10-16 09:16:27.284 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 18 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:27.284 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 19 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:28.284 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 18 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:28.284 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 19 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:29.285 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 18 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:29.285 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 19 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:29.534 +0200 ethernet1/3 idx 18, mux state change ATTACHED=>RX_TX
2017-10-16 09:16:29.534 +0200 ethernet1/3 idx 18, mux state in RX_TX
2017-10-16 09:16:29.534 +0200 post LACP event to DP: if_idx 18, up 1
2017-10-16 09:16:29.534 +0200 log ethernet1/3 idx 18 join lag
2017-10-16 09:16:29.534 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 18 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:29.534 +0200 ethernet1/4 idx 19, mux state change ATTACHED=>RX_TX
2017-10-16 09:16:29.534 +0200 ethernet1/4 idx 19, mux state in RX_TX
2017-10-16 09:16:29.534 +0200 post LACP event to DP: if_idx 19, up 1
2017-10-16 09:16:29.534 +0200 log ethernet1/4 idx 19 join lag
2017-10-16 09:16:29.534 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 19 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6
2017-10-16 09:16:30.284 +0200 debug: pan_l2ctrld_send_pkt(pan_l2ctrld.c:275): 20 Sent 132 (124 on wire) bytes of LACPDU pkt to DP on socket 6

 

What exactly is the problem? Where can I configure these poll timers?

 

 

... lacp worker poll timer interval too long ...

... lacp worker poll timer interval too short ...

... lacp one-second timer interval too long ...

 

Is the interval now too short or too long? It does not make much sense. Anyone has seen this before?

 

I have an issue with one port of the ae (disconnected), but I see the message every 5 minutes.

LACP interface ethernet1/7 moved out of AE-group ae3. Selection state Unselected(Link down

Why does the FW keep generating the message?

@ET 

 

I've seen this caused by a high DP CPU situation at the time when Interface went out of LACP group.

What PAN-OS version are you running ? If not a recommended version then I suggest upgrading first to a recommended release.

 

If this persist then I'd recommend getting the logs discussed in the article + TSF and reach out to TAC.

Hi,

 

Is there a fix for this?

 

Regards,

Raghav

hi @RbadigerCY

 

If you fix the peer (router/switch) the issue gets resolved: this is a remote issue

Hi Reaper,

 

What has to be fixed on peer(router/switch)?

 

Regards,

Raghav

hi @RbadigerCY

 

the example listed above is an action taken at the software layer where an interface is removed because it is not getting expected informtion from it's remote peer, so the remote peer (switch/router) will likely have an error message of it's own. First step is to compare logs on both ends and see what is going on (was the message received, did the peer have an issue, was the return sent but not received by the firewall,...)

Hi,

 

I have 2 cables connected directly to Palo Altos(in HA), no devices in middle. 

Interface is going out of ae group and coming back in.

 

Saw the following error on l2ctrld log.

 

 idx 84 mux state change RX_TX=>ATTACHED, select_state Unselected(Link down), partner state 0x3d

 

Regards,

Raghav

hi @RbadigerCY

 

That seems like a very unusual configuration? you're running LACP between two Palo Alto firewalls in the same cluster? I'm not sure I understand what you're hoping to achieve by this. I would recommend running LACP with routers or switches north and south to optimize the redundancy gained by employing LACP, between HA members I don't think you'll get much use for LACP

Hi,

 

I am using 2 interfaces for HA3 which are configured in aggregate group and LACP is enabled. If any one link goes down, HA3 will still remain up.

That is the reason I have put aggregate interface and enabled LACP.

 

Is there any way I can do it?

 

Regards,

Raghav

hi @RbadigerCY

simple aggregation (creating an AE interface and adding the physical interfaces) is sufficient for HA3, since there is no complex routing/bandwidth optimalization etc requirement. If one link goes down the other link will still function, load balancing etc is taken care of also

Simple AE interface is good. Interface do not flap.

 

Once I enable LACP, it starts flapping after 30-40 minutes. LACP is the cause for flapping?

 

Regards,

Raghav

I don't think LACP should be supported on HA interfaces as there's no added value (and possibly not physically supported)

 

you may want to reach out to support to have this formally reviewed