01-21-2013 02:36 AM
I have an issue where we are seeing some strange issues with VoIP traffic.
S/W Version: 4.1.9
VoIP Provider: Foehn IP Telephone systems.
Latest Application version.
A new VoIP system has been deployed which has SIP traffic passing through the PA-2020.
Application override policies setup for incoming and outgoing SIP traffic with applications set to unknown-tcp and unknown-udp (as per a PAN document/recommendations). VoIP traffic passes through the PA-2020 and reaches the server on the external side for communication.
Everytime there is an internet outage or power loss (which is happening fairly regularly), we lose connection to the VoIP server which is obvious. Upon restoring internet connection, I am unable to contact the VoIP server thereby no telephone connections are established. No SIP traffic passes through. But at the very same time, if I bypass Palo Alto, everything works absolutely fine.
Strangely, what I've noticed is that when the PA-2020 is rebooted immediately after the internet connection is restored, the SIP traffic passes through the Palo Alto absolutely fine. And if the PA is not rebooted and traffic is left to bypass the PA for 30mins or so and upon plugging the PA back in-line, everything seems to work fine with SIP traffic.
Hence the issue is everytime we lose internet connection, we either have to reboot the PA-2020 or completely bypass it for the SIP traffic as it is an educational environment and phones are critical. This is now a very strange issue for me to troubleshoot.
Can anyone please shed some light on this or share your experiences with SIP traffic.
01-21-2013 03:45 AM
I wasn't aware that using an app-override to unknown-tcp or unknown-udp was recommended. Typically you create a custom app and app-override to it. Maybe there is a special reason this was recommended to you?
Maybe the session table is becoming stale when the connection drops? Have you tried clearing the session table after an outage? If you can find the offending sessions in the session table and only clear those that would be even better. Maybe the clients are still just using their existing sessions and not trying to reconnect? Or if they are creating new sessions, maybe the old sessions stuck in the table are keeping the new ones from starting?
There are a lot of ways to go here without packet captures and logs. I'd start with the session table (session monitor) and also the drop filters on the CLI.
01-21-2013 08:59 AM
Just one question, if I clear the session table and the VoIP service is restored, do I have clear the session table everytime there is an outage?
01-21-2013 07:01 PM
I would tend to agree with KBrazil.... When the SIP server power is regained, the existing sessions need to be timed out.
Which is why it works when you reboot the FW, or if you let the traffic bypass the FW. Both accomplish the same thing.
So I would agree that you should clear the session table and narrow it down to those offending sessions.
01-21-2013 07:09 PM
If that works I would see it as a temporary troubleshooting step and workaround until the root cause is found and fixed. Your support provider should be able to assist here.
01-22-2013 11:51 PM
Im fighting with similar problem as kalyanram.piratla.
I have VoIP client on Internet and my SIP gateway in DMZ. I have aplication override too.
But from time to time (I havnt problems with internet connections) VoIP client cant estabilish RTP session, but whole signaling is working OK. I mean that when I picked up phone and make a connection to any number (ie my mobile), my mobile is ringing and I see number what is calling to my mobile, but after I picked up the call I hear nothing. If its happend I need to clear SIP session and after that wverything is working perfectly for another few days.
show session all filter source x.y.z.q (IP of your VoIP Provider)
and after you will find id of SIP session,
clear session id XXXXX
Maybe it will be usefull for you.
01-28-2013 01:48 PM
I am in a similar situation, and I currently have a case open with PAN support. In my case I have dual ISP setup. Every time there is a fail over from my primary to secondary or a fail back from secondary to primary, the SIP sessions hang in limbo. They do not timeout. This is because VoIP phones are constantly sending UDP SIP registration packets and the PAN device resets the TTL for the session.
When a packet is received on an interface, the firewall first does a session look up, and if one exists hands it off to that session without further processing. There is an official document that describes how packets are handled inside a PAN firewall if you are interested.
One of the things you could do is to set the SIP or unknown-udp application timeout values to be less than the SIP registration interval that your phones use. You would assume this to be 30 seconds but that is not always the case. If you have wireshark running, and properly filter the packets, you'll get a pretty good idea of the time difference between each SIP registration packet. Of course this means that you are putting a lot of load on your firewall and this might increase with the number of VoIP phones you have in your infrastructure.
I am waiting on support to come back with a solution, but this might not be as simple a fix as it may seem.
01-30-2013 04:57 AM
Hey, anything back from your support with a solution? My concern is that I can't go back to the customer asking them to clear the session table everytime there is an internet or power outage. It would raise a few questions with otherwise a very good product.
Secondly, I cannot do any testing as there hasn't been any outage ever since I have opened this thread (touch wood)... but I am still looking for a fix to this if possible without needing to clear the session tables or changing the timeout values since this kind of impacts the firewall performance.
Please correct me if I have mistaken anything.
01-30-2013 05:15 AM
My support case (id 00101310) was opened on 10/22/2012. Until now I waiting for any progress ...
In my opinion You can try to reset SIP session - it doesnt impact firewall performance.
01-30-2013 05:48 AM
While I do agree that the session tables need to be cleared, there is really only 2 ways that I can understand how to accomplish this.
Manually resetting the session table is certainly not advisable, as the customer would definitely see this as a detriment.
I am curious, about how much impact you feel affects the performance of the FW, by changing the default timeout to a different value?
Let's put this into perspective. All APP-ID and content-ID are going to be more CPU intensive (hence the specialized CPUs) that by reducing a timeout value (until the FW saw the next UDP packet, and start over again) would be slightly negligible (I could be wrong).
01-30-2013 12:38 PM
I still have the case open with support and they are currently researching the issue. I sent them over my techsupport files and they were able to reproduce the problem on their lab setup.
Currently, this is what I have suggested my colleagues to do:
1. Every time the see an ISP failover, login to CLI session and issue the following:
a. show session all filter application sip
b. show session all filter application unknown-udp
Now if all your phones are already registered with your provider, they should all show up on these two commands. It would be nice to have the firewall accurately detect the SIP traffic instead of classifying it as unknown-udp. Now to clear the sessions, all you have to do is issue:
a. clear session all filter application sip
b. clear session all filter application unknown-udp
In regards to performance - This will highly depend on the number of phones you have. Simply put, the firewall would have to process approximately 'N' new sessions every 'X' seconds where 'N' is the total number of phones you have and 'X' is the SIP registration interval. Now, I wouldn't try and do this either as again this is not a scalable solution.
I am trying to tackle this from a different perspective. If you notice not all customers who have VoIP phones and using PAN as their firewall are not having this issue (If everyone did, there would be an uproar). So it must be something very specific tied to your VoIP provider or your Phone manufacturer.
For example, we use Polycom phones with Vocalocity as our provider. Using Wireshark I observed that, when the phone is not currently on a call, a SIP registration packet is sent out approximately every 15 seconds. I also monitored this phone's session on the firewall itself and confirmed that the TTL is reset every 15 seconds approximately. Now, I logged in to the phone, and I have an option called NAT keep-alive under network settings that is set to 15 seconds. So, I am currently working with my VoIP provider to see if we can make changes to our phone configuration packages.
However, at the end of the day, PAN should clear the sessions if there is a failover. I cannot think of a reason why the firewall would not want to do that. I hope we will see a solution to this in the next release scheduled in February.
06-26-2013 05:16 AM
I have some news (bad or good - it's depends)
My problem was finally recognized. In short words:
There is a certain counter 'ctd_tdb_changed' that can be triggered during content / AV upgrade which will cause long lived SIP sessions to switch from 'layer7 processing : enabled' to 'layer7 processing : completed'. This can be viewed in 'show session id x' output for the sip session.
Once the SIP session is 'completed' then ALG/predict session will not function properly.
BUT it may be fixed in 6.x PAN !!! (according to actual informations)
Please ask your local Sales SE to force this fix to be able in 5.0.x
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!