I have an issue where we are seeing some strange issues with VoIP traffic.
S/W Version: 4.1.9
VoIP Provider: Foehn IP Telephone systems.
Latest Application version.
A new VoIP system has been deployed which has SIP traffic passing through the PA-2020.
Application override policies setup for incoming and outgoing SIP traffic with applications set to unknown-tcp and unknown-udp (as per a PAN document/recommendations). VoIP traffic passes through the PA-2020 and reaches the server on the external side for communication.
Everytime there is an internet outage or power loss (which is happening fairly regularly), we lose connection to the VoIP server which is obvious. Upon restoring internet connection, I am unable to contact the VoIP server thereby no telephone connections are established. No SIP traffic passes through. But at the very same time, if I bypass Palo Alto, everything works absolutely fine.
Strangely, what I've noticed is that when the PA-2020 is rebooted immediately after the internet connection is restored, the SIP traffic passes through the Palo Alto absolutely fine. And if the PA is not rebooted and traffic is left to bypass the PA for 30mins or so and upon plugging the PA back in-line, everything seems to work fine with SIP traffic.
Hence the issue is everytime we lose internet connection, we either have to reboot the PA-2020 or completely bypass it for the SIP traffic as it is an educational environment and phones are critical. This is now a very strange issue for me to troubleshoot.
Can anyone please shed some light on this or share your experiences with SIP traffic.
I wasn't aware that using an app-override to unknown-tcp or unknown-udp was recommended. Typically you create a custom app and app-override to it. Maybe there is a special reason this was recommended to you?
Maybe the session table is becoming stale when the connection drops? Have you tried clearing the session table after an outage? If you can find the offending sessions in the session table and only clear those that would be even better. Maybe the clients are still just using their existing sessions and not trying to reconnect? Or if they are creating new sessions, maybe the old sessions stuck in the table are keeping the new ones from starting?
There are a lot of ways to go here without packet captures and logs. I'd start with the session table (session monitor) and also the drop filters on the CLI.
I would tend to agree with KBrazil.... When the SIP server power is regained, the existing sessions need to be timed out.
Which is why it works when you reboot the FW, or if you let the traffic bypass the FW. Both accomplish the same thing.
So I would agree that you should clear the session table and narrow it down to those offending sessions.
If that works I would see it as a temporary troubleshooting step and workaround until the root cause is found and fixed. Your support provider should be able to assist here.
Im fighting with similar problem as kalyanram.piratla.
I have VoIP client on Internet and my SIP gateway in DMZ. I have aplication override too.
But from time to time (I havnt problems with internet connections) VoIP client cant estabilish RTP session, but whole signaling is working OK. I mean that when I picked up phone and make a connection to any number (ie my mobile), my mobile is ringing and I see number what is calling to my mobile, but after I picked up the call I hear nothing. If its happend I need to clear SIP session and after that wverything is working perfectly for another few days.
show session all filter source x.y.z.q (IP of your VoIP Provider)
and after you will find id of SIP session,
clear session id XXXXX
Maybe it will be usefull for you.
I am in a similar situation, and I currently have a case open with PAN support. In my case I have dual ISP setup. Every time there is a fail over from my primary to secondary or a fail back from secondary to primary, the SIP sessions hang in limbo. They do not timeout. This is because VoIP phones are constantly sending UDP SIP registration packets and the PAN device resets the TTL for the session.
When a packet is received on an interface, the firewall first does a session look up, and if one exists hands it off to that session without further processing. There is an official document that describes how packets are handled inside a PAN firewall if you are interested.
One of the things you could do is to set the SIP or unknown-udp application timeout values to be less than the SIP registration interval that your phones use. You would assume this to be 30 seconds but that is not always the case. If you have wireshark running, and properly filter the packets, you'll get a pretty good idea of the time difference between each SIP registration packet. Of course this means that you are putting a lot of load on your firewall and this might increase with the number of VoIP phones you have in your infrastructure.
I am waiting on support to come back with a solution, but this might not be as simple a fix as it may seem.
Hey, anything back from your support with a solution? My concern is that I can't go back to the customer asking them to clear the session table everytime there is an internet or power outage. It would raise a few questions with otherwise a very good product.
Secondly, I cannot do any testing as there hasn't been any outage ever since I have opened this thread (touch wood)... but I am still looking for a fix to this if possible without needing to clear the session tables or changing the timeout values since this kind of impacts the firewall performance.
Please correct me if I have mistaken anything. :smileyhappy:
My support case (id 00101310) was opened on 10/22/2012. Until now I waiting for any progress ...
In my opinion You can try to reset SIP session - it doesnt impact firewall performance.
While I do agree that the session tables need to be cleared, there is really only 2 ways that I can understand how to accomplish this.
Manually resetting the session table is certainly not advisable, as the customer would definitely see this as a detriment.
I am curious, about how much impact you feel affects the performance of the FW, by changing the default timeout to a different value?
Let's put this into perspective. All APP-ID and content-ID are going to be more CPU intensive (hence the specialized CPUs) that by reducing a timeout value (until the FW saw the next UDP packet, and start over again) would be slightly negligible (I could be wrong).
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the Live Community as a whole!
The Live Community thanks you for your participation!