We are seeing a strange issue with our 4020s (running 4.1.8h2 right now) where, as far as we can tell, partial commits are at times causing some kind of HA event that, at times, causes some already-established RTP streams that are allowed by the policy to change from allowed to a discarding state. This causes our Polycom videoconference systems to either freeze the video or lose audio depending on which stream is affected. We were advised by TAC to put in an application override for this but that hasn't helped. For example, in the system logs we see a partial commit: 2013/07/17 16:05:44info general general 0 Config installed In the HA logs right after this we see this: Jul 17 16:05:45 sysd notificatioon for object sw.mgmt.runtime.ncommits Jul 17 16:05:45 Error: pan_if_name_decompose(pan_if.c:1092): unknown interface type dedicated Jul 17 16:05:45 Error: pan_if_name_decompose(pan_if.c:1092): unknown interface type dedicated Jul 17 16:05:45 Peer HA3 MAC is 00:00:00:00:00:00 Jul 17 16:05:45 Peer HA2 MAC is 00:1b:17:23:8e:06 Jul 17 16:05:45 default ha1 interface 5 Jul 17 16:05:45 default ha2 interface 6 Jul 17 16:05:45 Dataplane HA state transition: from 5 to 5 And in the traffic logs just after this (note the traffic is matching our override AppID; it would normally be RTP): Jul 17 16:05:45 blah-fw1.obm.company.com 1,2013/07/17 16:05:45,0002C101442,TRAFFIC,deny,0,2013/07/17 16:05:44,10.10.20.31,10.10.31.50,0.0.0.0,0.0.0.0,client to dc allow,,,polycom-udp-custom,vsys2,clientedge,clientdc,vlan.20,vlan.20,logpol,2013/07/17 16:05:45,297856,1,3244,49166,0,0,0x0,udp,deny,1792,1792,0,28,2013/07/17 15:59:37,362,any,0,9157668142,0x0,10.0.0.0-10.255.255.255,10.0.0.0-10.255.255.255,0,28,0 The policy this is matching, "client to dc allow" is a permit any/any policy for the zones involved so should never deny traffic. Another interesting bit in that traffic log is that it shows both the ingress and egress interfaces being vlan.20 (the side facing 10.10.20.31) even though that is incorrect. The working flows show one side on vlan.20 and the other going to the routed interface into the data center. Not every partial commit seems to cause the HA events, and not every HA event of this time seems to kill a VC session. But, every single VC lockup has matched the timing of these other kinds of events in the logs, to the second. Does this ring any bells for anyone?
... View more