We have been experiencing an interesting issue where users that connect via GlobalProtect, will all of a sudden no longer be able to access internal resources after about an hour/2hrs of initially connecting. External resources work absolutely fine during this period. The only way to remedy this is to disconnect GP/refresh the connection. Additionally, after anywhere from 1-10 minutes of being in this state, the GP pop-up will automatically display, and the user will be reconnected with internal access restored. If a user is SSH'ed into a device, and then the internal drop issue occurs, the SSH session will still be available, but the user cannot initiate another SSH session to the same IP. That is occurring on both Windows and Mac machines. We are running a pair of a 3020s in active/passive on code 9.1.13H3.
During this time period, the log shows that the dropped traffic does not have an associated username or egress interface:
All of our internal resources have a HIP requirement to verify a supported AV and OS. We had created an exclusion rule for one specific internal node, and when a user drops, they CAN reach this specific node. It seems related to HIP, but it doesn't explain the above behavior with the missing username/egress interface as well as no logs on the Palo or client side regarding missed/failed HIP checks.
We have sent countless logs to Palo, as well as multiple meetings with case escalations to different engineers, but have not received any useful feedback. They have only suggested updating the GP client, which we had one user do (5.2.12-26) and the same issue persists. Palo also suggested that we completely disable HIP checks which we would prefer against, but we are running out of options here.
One other thing of interest, is that if users have a constant ping going to an internal resource, it seems like the internal resources never drop.
User logs haven't really guided us in the correct direction either, but here are some things of interest from around the time when users had dropped:
P6320-T6324)Info ( 366): 07/14/22 02:01:20:600 Received sleep event
(P6320-T25896)Debug(6627): 07/14/22 02:01:21:858 NetworkConnectionMonitorThread: route change detected. Wait for 3 seconds.
(P6320-T25896)Debug(5890): 07/14/22 02:01:21:858 No need to check gateway route since no tunnel.
(P15604-T15608)Dump (1320): 07/21/22 10:45:00:401 WM_CALL_BALLOON message ---> Disconnected
(P15604-T15608)Debug(1281): 07/21/22 10:45:01:623 CSystemTray::WindowProc - system power events received. wParam = 7
(P15604-T15608)Debug(1288): 07/21/22 10:45:01:623 received WM_POWERBROADCAST message = 0x7
(P15604-T15608)Dump ( 110): 07/21/22 10:45:01:623 new command added to the queue at the back.
(P15604-T15608)Debug(1312): 07/21/22 10:45:01:623 Wake up and send non-portal message.
(P15604-T15608)Debug(1281): 07/21/22 10:45:01:822 CSystemTray::WindowProc - system power events received. wParam = 18
(P15604-T15804)Debug( 648): 07/21/22 10:45:01:904 Send command to Pan Service
(P15604-T15804)Debug( 676): 07/21/22 10:45:01:904 Command = <request><type>sleep</type></request>
(P15604-T15804)Debug( 728): 07/21/22 10:45:01:910 PanClient sent successful with 64 bytes
(P15604-T15608)Dump ( 76): 07/21/22 10:45:01:913 OnReceive error=0
(P6320-T6324)Info ( 366): 07/13/22 19:16:53:052 Received sleep event
(P6320-T12192)Debug(6627): 07/13/22 19:16:55:071 NetworkConnectionMonitorThread: route change detected. Wait for 3 seconds.
(P6320-T12192)Info (5178): 07/13/22 19:16:55:071 Virtual IP route entry is removed
(P6320-T12192)Debug(6633): 07/13/22 19:16:55:071 Set retry network check event due to removed gateway route
(P6320-T24076)Debug( 610): 07/13/22 19:16:55:071 Network changed, break from ProcMonitor
(P6320-T24076)Debug( 646): 07/13/22 19:16:55:071 Tunnel downtime is 47 miliseconds
(P6320-T24076)Debug(5587): 07/13/22 19:16:55:071 Show Gateway **VPN NAME**: Checking network availability and restoring VPN connection when network is available.
(P6320-T24076)Debug(7095): 07/13/22 19:16:55:071 --Set state to Restoring VPN Connection
(P6320-T12192)Debug( 198): 07/13/22 19:16:55:071 Now is 92548531. CheckHipTimeoutAfterSleep: 4920000 ms
(P6320-T24076)Debug( 692): 07/13/22 19:16:55:071 Stop ProcDrv before disconnect
(P6320-T7788)Info (1005): 07/13/22 19:16:55:071 ProDrv: VPN disconnect event, get out of ProcDrv
(P6320-T7788)Info (1034): 07/13/22 19:16:55:071 ProcDrv thread dies
(P6320-T24076)Info ( 968): 07/13/22 19:16:55:071 ProcDrv quit
(P6320-T24076)Debug( 697): 07/13/22 19:16:55:071 Stop ProcTunnel before disconnect
(P6320-T13300)Info (1085): 07/13/22 19:16:55:071 ProTunnel: VPN disconnect event, get out of ProcTunnel
(P6320-T13300)Info (1101): 07/13/22 19:16:55:071 ProcTunnel thread dies
(P6320-T24076)Info (1049): 07/13/22 19:16:55:071 ProcTunnel quit
(P6320-T24076)Debug( 229): 07/13/22 19:16:55:071 IPSec anti-replay statistics: outside window count 0, replay count 0
(P6320-T24076)Debug( 231): 07/13/22 19:16:55:071 Disconnect udp socket
(P6320-T24076)Debug( 612): 07/13/22 19:16:55:071 unset network
The log file isn't all that interesting since we don't know the time frame that the user experienced the issue honestly. You have some sleep events but I'm not seeing anything in what you provided that really stands out. I'm assuming that you've all verified that you aren't dropping any traffic to the gateway that would affect the HIP updates being received by the client?
The logs you're showing make perfect sense to me. If you aren't getting the HIP check updates by the client, your going to lose the source user information as well. The action on these are policy-deny, so you don't have an egress interface. The traffic wasn't allowed, so you'll never egress the traffic, only ingress.
The default HIP check interval is every 1 hour, so if you have any clients that disconnect at both the 2hr interval and the 1hr interval that would point towards at least some checks working properly. Do you know for sure that this has extended into a 2hr interval, or is this actually only 1hr? It would kind of make more sense if this was only 1 hour for everyone instead of it sometimes working.
One thing you can check on the firewall directly is if you enable URL alert logging on this connection you should see the clients accessing Portal_IP/ssl-vpn/hipreportcheck.esp in your URL logs. This would be from the public IP of the client and not the GlobalProtect pool IP that gets assigned. The HIP check traffic from the client all originates from the client itself and not the GlobalProtect connection.
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!