Episode Transcript:
John:
Hello PANCasters and welcome back. Today Olivier is back to talk to us about Monitoring. Welcome back Olivier.
Olivier:
Thanks John. Good to be back on PANCast™.Olivier Zheng, PCNSE, is a Staff Support Engineer at Palo Alto Networks. As SME Management/Logging Reporting in Technical Assistance Centre Singapore, he is supporting customers and participating in multiple knowledge sharing initiatives by writing content in the Knowledge Base, by delivering training to internal engineers. He is responsible for 1 issued patent. Olivier holds a Master of Science Mobile and High Speed telecom networks from Oxford Brookes University, UK and a Master of Science in Computer Science and Information Technology from ESI SUPINFO Paris, France.
John:
So Olivier, today is about monitoring. What are we going to be talking about specifically?
Olivier:
Well, monitoring in general is pretty important to make sure any environment is operating correctly and for Palo Alto Networks products it is no different. There are a lot of different options when it comes to monitoring our products so we don’t have time to cover all in detail in one episode.
This is more an overview of the options we have.
John:
Great Olivier, so where do we start?
Olivier:
OK, before we get into the different options, let’s discuss some high level topics.
Firstly, monitoring can be either on the device itself such as checking local logs or running CLI commands, or it can be external. For instance you have a syslog server that is receiving logs and you can use this server to do log analysis.
The second thing is we can broadly categorize monitoring into fault monitoring and performance monitoring.
Fault monitoring is as it suggests that there is an issue with a device and an event, or log is generated to inform the required teams that there is an issue and it needs to be investigated. Performance monitoring is more related to the health of a device and would cover things like throughput, CPU utilization and memory utilization. With performance monitoring the idea is to take measures at regular internal and have this monitored constantly. It is also possible to have some alert in place by setting some threshold, so for instance if the dataplane CPU goes above 80%, you can generate an alert.
John:
OK, so what options do we have then with Palo Alto Networks devices?
Olivier:
OK, let's start with fault monitoring. Panorama and Palo Alto Networks firewalls have logging enabled and will log various system events. This is ok if you log into a device and have a look at the logs but really you want to be notified of critical events so you can take action. The most common way is to use SNMP traps. With SNMP traps, you can send events to an SNMP receiver which could then also be used to automatically generate an incident for the operations teams to look at. This can be based on the severity of the alert so maybe for critical and high events, an incident is automatically raised but for low and informational the events are still received but can be reviewed non real time.
With Palo Alto Networks logging, we actually have a number of methods we can forward events to external systems. On top of SNMP which is the most common one, we also support http, email and syslog. Syslog is also very commonly used but more in situations where you want to send things like traffic logs for long term storage.
While we can look for events on the firewalls, or Panorama, SNMP is still the most widely used setup for alerting on events.
John:
Thanks Olivier, what about performance monitoring?
Olivier:
Well, SNMP is also probably the most widely used. Where SNMP traps cover events, SNMP polling allows you to have real time monitoring on devices. This can be setup so you poll data on the devices over time. And then you will get historical data of, let's say, dataplane CPU, throughput, packets per second, and so on. The data you can collect is based on MIBs and so some are generic MIBs and you have also some vendor specific MIBs. We have our Palo Alto Networks MIBs which allow you to poll for specific data on Palo Alto Networks devices. A lot of detailed data on our firewalls is in what are called global counters and while you can view these via CLI if working on an active issue, you can also graph most of them via SNMP.
Now while the preference would be to use a dedicated SNMP server for polling, it may not be possible in all cases so for customers using Panorama to manage their firewalls, we do have a device monitoring section on Panorama. This will graph the common data points like CPU and throughput over time, so you can monitor the firewalls. It also baselines the data and can alert if current trends don’t match the baseline.
John:
Thanks Olivier. So it sounds like SNMP is the most common monitoring system used. Anything we should be aware of with SNMP?
Olivier:
Good question. On the topic of SNMP MIBs, you need to make sure the MIBs you are using, are loaded on the SNMP polling server so all the value polled have a correct definition. Also keep in mind that SNMP polling is interval based so depending on the configuration on your SNMP server, it may not show the accurate information. The last point I want to discuss is the version of SNMP. The two current ones are version 2 and version 3. Version 2 is still very widely used however I advise you to consider to move to SNMPv3. It doesn't require much configuration and the main benefit is it improves the security with the authentication and encryption for SNMP.
John:
Got it. I know we also have the ACC on the firewalls. Does this fall into monitoring as well?
Olivier:
Yes, the ACC is also another tool which is on our firewalls and on Panorama and it is mostly used for data on traffic, session, threats, etc. It is built on traffic and threat logs and can be great for a high level view. It does also have stats like bytes but there are a couple of things to be aware of with the ACC.
Firstly, the ACC data is all based on the logs so the important thing here is that if you have policies that do not have logging enabled, that traffic will not be in the ACC.
The second gotcha is the byte statistics. Again because the data is based on the logs and logs on Palo Alto Networks devices are generated at the session end, it can sometimes be misleading. As an example let's say you have a data transfer between two servers that keeps the session open indefinitely and transfers a lot of data. For some reason the session terminates and only then the log is generated. That log has a generated time which is what the ACC uses but the data transferred shows much bigger value. In reality the bytes transferred would be split over the time of the session but that info is not currently tracked.
The ACC has some great info but just a couple of things to be aware of.
John:
Great, any other monitoring protocols we support?
Olivier:
Yes there are few more. I won’t go into too much detail but just so our listeners are aware these are available with Palo Alto Networks.
Firstly is OpenConfig. OpenConfig is a set of vendor neutral models that allow monitoring via gNMI. This allows telemetry streaming to monitor devices.
Netflow may also be something you have heard before and this is also supported on Palo Alto Network devices. Netflow is more for traffic passing through the firewall and can send session and stream information to a Netflow collector for additional analysis.
And finally there is the XML API. Although this is more commonly used to configure devices, there are some limited use cases where you can use the API for monitoring information.
John:
You also mentioned checking logs locally and CLI commands. When would we use this?
Olivier:
They would be helpful when you are looking at an issue. Hopefully one of the other monitoring systems would have picked up the issue and if you are then troubleshooting why, you would then perhaps look at the local logs including debug logs and also CLI commands to troubleshoot. We did have an earlier episode that focused a bit on the logging part so our listeners can check out that episode.
John:
Thanks Olivier. Finally, can I ask, what do we do if monitoring is not working?
Olivier:
Well, that really depends on which part.
If we are talking about the events or the logs on the device then the first thing would be to check the configuration. If that all looks ok and you suspect an issue with the device then it may need deeper analysis and probably best to raise a TAC case. For external services like SNMP and syslog, again the configuration is the first place to start. If that all looks ok, remembering this would require network connectivity, so you start with some basic network connectivity check. Can you ping the SNMP server from the firewall? Any logs showing connection errors? Maybe a packet capture on the firewall to see if traps are sent or polling requests are coming in?
John:
Great info! Thanks again Olivier.
Olivier:
You’re welcome John. Hope to be back soon.
John:
Would love to have you back. PANCasters, as always the transcript to the episode will be on live.paloaltonetworks.com and remember to subscribe and like.
Related Content:
NGFW Panorama