Panorama Sizing and Design Guide

by cstancill on ‎02-09-2016 10:49 AM - edited 2 weeks ago (65,326 Views)

Panorama Management and Logging Overview

 

 

 

 

 

The Panorama solution is comprised of two overall functions: Device Management and Log Collection/Reporting. A brief overview of these two main functions follow:

 

Device Management: This includes activities such as configuration management and deployment, deployment of PAN-OS and content updates.

Log Collection: This includes collecting logs from one or multiple firewalls, either to a single Panorama or to a distributed log collection infrastructure. In addition to collecting logs from deployed firewalls, reports can be generated based on that log data whether it resides locally to the Panorama (e.g single M-series or VM appliance) for on a distributed logging infrastructure.

 

The Panorama solution allows for flexibility in design by assigning these functions to different physical pieces of the management infrastructure. For example: Device management may be performed from a VM Panorama, while the firewalls forward their logs to colocated dedicated log collectors:

 

Graphic1.png

 

 

 

In the example above, device management function and reporting are performed on a VM Panorama appliance. There are three log collector groups. Group A, contains two log collectors and receives logs from three standalone firewalls. Group B, consists of a single collector and receives logs from a pair of firewalls in an Active/Passive high availability (HA) configuration. Group C contains two log collectors as well, and receives logs from two HA pairs of firewalls. The number of log collectors in any given location is dependent on a number of factors. The design considerations are covered below. Note: any platform can be a dedicated manager, but only M-Series can be a dedicated log collector.

 

 

Log Collection

 

Managed Devices

 

While all current Panorama platforms have an upper limit of 1000 devices for management purposes, it is important for Panorama sizing to understand what the incoming log rate will be from all managed devices. To start with, take an inventory of the total firewall appliances that will be managed by Panorama.

 

Use the following spreadsheet to take an inventory of your devices that need to store logs:

MODEL PAN-OS (Major Branch #)  Location Measured Average Log Rate  
Ex: 5060    Ex: 6.1.0 Ex: Main Data Center   Ex. 2500 logs/s
       
       
       
       

 

  

Logging Requirements

 

This section will cover the information needed to properly size and deploy Panorama logging infrastructure to support customer requirements. There are three main factors when determining the amount of total storage required and how to allocate that storage via Distributed Log Collectors. These factors are:

  • Log Ingestion Requirements: This is the total number of logs that will be sent per second to the Panorama infrastructure.
  • Log Storage Requirements: This is the timeframe for which the customer needs to retain logs on the management platform. There are different driving factors for this including both policy based and regulatory compliance motivators.
  • Device Location: The physical location of the firewalls can drive the decision to place DLC appliances at remote locations based on WAN bandwidth etc.

 

Each of these factors are discussed in the sections below:

 

Log Ingestion Requirements

 

The aggregate log forwarding rate for managed devices needs to be understood in order to avoid a design where more logs are regularly being sent to Panorama than it can receive, process, and write to disk. The table below outlines the maximum number of logs per second that each hardware platform can forward to Panorama and can be used when designing a solution to calculate the maximum number of logs that can be forwarded to Panorama in the customer environment.

 

         Device Log Forwarding

Platform  Supported Logs per Second (LPS) 
PA-200 250
PA-220 2,400
PA-500 1,250
PA-800 1,250
PA-2000 series 1,250
PA-3000 series 5,000
PA-4000 series

5,000

PA-5050/60

20,000

PA-5220 60,000
PA-5250 120,000
PA-5260 240,000
PA-7050 120,000
PA-7080 

120,000

VM-100/200

5,000

VM-300/1000-HV

8,000

VM-500

8,000

VM-700

17,500

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


The log ingestion rate on Panorama is influenced by the platform and mode in use (mixed mode verses logger mode). The table below shows the ingestion rates for Panorama on the different available platforms and modes of operation. 

 

 

 

       Panorama Log Ingestion

Platform  Mixed Dedicated 
VM 10,000 N/A
M-100 10,000 18,000
M-500 15,000 30,000

 

The above numbers are all maximum values. In live deployments, the actual log rate is generally some fraction of the supported maximum. Determining actual log rate is heavily dependent on the customer's traffic mix and isn't necessarily tied to throughput. For example, a single offloaded SMB session will show high throughput but only generate one traffic log. Conversely, you can have a smaller throughput comprised of thousands of UDP DNS queries that each generate a separate traffic log. For sizing, a rough correlation can be drawn between connections per second and logs per second.

 

 

Methods for Determining Log Rate

New Customer:

  • Leverage information from existing customer sources. Many customers have a third party logging solution in place such as Splunk, ArcSight, Qradar, etc. The number of logs sent from their existing firewall solution can pulled from those systems. When using this method, get a log count from the third party solution for a full day and divide by 86,400 (number of seconds in a day). Do this for several days to get an average. Be sure to include both business and non-business days as there is usually a large variance in log rate between the two.
  • Use data from evaluation device. This information can provide a very useful starting point for sizing purposes and, with input from the customer, data can be extrapolated for other sites in the same design.  This method has the advantage of yielding an average over several days. A script (with instructions) to assist with calculating this information can be found is attached to this document. To use, download the file named "ts_lps.zip". Unpack the zip file and reference the README.txt for instructions.
  • If no information is available, use the Device Log Forwarding table above as reference point. This will be the least accurate method for any particular customer.

Existing Customer:

    For existing customers, we can leverage data gathered from their existing firewalls and log collectors:

    • To check the log rate of a single firewall, download the attached file named "Device.zip", unpack the zip file and reference the README.txt file for instructions. This package will query a single firewall over a specified period of time (you can choose how many samples) and give an average number of logs per second for that period. If the customer does not have a log collector, this process will need to be run against each firewall in the environment.
  • If the customer has a log collector (or log collectors), download the attached file named "lc_lps.zip", unpack the zip file and reference the README.txt file for instructions This package will query the log collector MIB to take a sample of the incoming log rate over a specified period.

 

Log Storage Requirements

 

Factors Affecting Log Storage Requirements

There are several factors that drive log storage requirements. Most of these requirements are regulatory in nature. Customers may need to meet compliance requirements for HIPAA, PCI, or Sarbanes-Oxely.

 

 

 

There are other governmental and industry standards that may need to be considered. Additionally, some companies have internal requirements. For example: that a certain number of days worth of logs be maintained on the original management platform. Ensure that all of these requirements are addressed with the customer when designing a log storage solution.

    Note that some companies have maximum retention policies as well. 

 

Calculating Required Storage

Calculating required storage space based on a given customer's requirements is fairly straight forward process. The following chart shows the size of each log type on disk. This number accounts for both the logs themselves as well as the associated indices. The Threat database is the data source for Threat logs as well as URL, Wildfire Submissions, and Data Filtering logs. The number below reflects an aggregate average.

    Note that we may not be the logging solution for long term archival.  In these cases suggest Syslog forwarding for archival purposes. 

 

      Log Size On Disk

Log Type  Size on Disk (in Bytes) 
Traffic 300
Threat 500

 

 

The equation to determine the storage requirements for particular log type is:

Storage Requirement Calculation

 

Example: Customer wants to be able to keep 30 days worth of traffic logs with a log rate of 1500 logs per second:

 

 

Log Storage Calculation Example

 

Total log storage is the sum of the results of the equation above for each log type. Traffic and Threat logs will generally have a much higher log rate than either Config or System logs and will consequently account for a majority of the storage requirements in any design.

 

There is a quota system on Panorama that must also be accounted for when sizing the log collection solution.  In the above example, we only calculated the size required for traffic logs for 30 days.  The default quota size for a log collector for traffic logs is 25%.  Therefore, we would need 1.1TB of log space for just traffic logs and by default that is 25% of our total storage size.  Customer can adjust quota based on desired retention periods. The attached worksheet will take into account the default quota on Panorama and provide a total amount of storage required.

 

The other factor to consider is the ratio of traffic to threat logs.  We have measured the average ratio to be approximately 70% traffic and 30% threat on our testbed when using URL filtering.  If you don't know the ratio, or want to be on the safe side, use 400 bytes in the above calculations to account for the ratio and that would account for both threat and traffic logs. For the Logging Service, all logs (Traffic/Threat/Infrastructure) are 1500 bytes in size.

 

 

Calculating Required Storage For Logging Service

 

There are three different cases for sizing log collection using the Logging Service. For in depth sizing guidance, refer to Sizing Storage For The Logging Service.

 

  1. Log collection for Palo Alto Networks Next Generation Firewalls
  2. Log collection for GlobalProtect Cloud Service Mobile User
  3. Log collection for GlobalProtect Cloud Service Remote Office

 

 

Log Collection for Palo Alto Next Generation Firewalls

The log sizing methodology for firewalls logging to the Logging Service is the same when sizing for on premise log collectors. The only difference is the size of the log on disk. In the Logging Service, both threat and traffic logs can be calculated using a size of 1500 bytes. 

 

Log Collection for GlobalProtect Cloud Service Mobile User

Per user log generation depends heavily on both the type of user as well as the workloads being executed in that environment. On average, 1TB of storage on the Logging Service will provide 30 days retention for 5000 users. An advantage of the logging service is that adding storage is much simpler to do than in a traditional on premise distributed collection environment. This means that if your environment is significantly busier than the average, it is a simple matter to add whatever storage is necessary to meet your retention requirements.

 

Log Collection for GlobalProtect Cloud Service Remote Office

GlobalProtect Cloud Service (GPCS) for remote offices is sold based on bandwidth. While log rate is largely driven by connection rate and traffic mix, in sample enterprise environments log generation occurs at a rate of approximately 1.5 logs per second per megabit of throughput. The attached sizing work sheet uses this rate and takes into account busy/off hours in order to provide an estimated average log rate.

 

 

 

 

 

LogDB Storage Quotas

After determining the total storage required for the solution, consider how that storage is allocated.In order to meet the customer requirement, the design may call for an adjustment of the quota settings. The default storage quotas for the M-series log collectors are:

 

                                   7.1

Log Type Quota (%)   
Traffic 25

Threat

25
Config 8
System 8
HIP Match 3
Extended Threat PCAPS   1
Traffic Summary 3
Threat Summary 3
URL Summary 3
Hourly Traffic Summary 1
Hourly Threat Summary 1
Hourly URL Summary 1
Daily Traffic Summary 1
Daily Threat Summary 1
Daily URL Summary 1
Weekly Traffic Summary 1
Weekly Threat Summary 1
Weekly URL Summary

1

TOTAL 88

 

In the table above, it can be seen that even when the total storage requirement is met, it may be necessary to adjust the quota configuration to meet the specification of the solution. When designing log collector groups, available storage for any given log type is the aggregate of the storage quotas allocated on each member of the group.

 

Storage quotas were simplified starting in PAN-OS version 8.0. Detail and summary logs each have their own quota,  regardless of type (traffic/threat):

 

Log Type

Quota (%)

Detailed Firewall Logs 60
Summary Firewall Logs 30
Infrastructure and Audit Logs 5
Palo Alto Networks Platform Logs .1
3rd Party External Logs .1
Total 95.2

 

 

 

Device Location

The last design consideration for logging infrastructure is location of the firewalls relative to the Panorama platform they are logging to. If the device is separated from Panorama by a low speed network segment (e.g. T1/E1), it is recommended to place a Dedicated Log Collector (DLC) on site with the firewall. This allows log forwarding to be confined to the higher speed LAN segment while allowing Panorama to query the log collector when needed. For reference, the following tables shows bandwidth usage for log forwarding at different log rates. This includes both logs sent to Panorama and the acknowledgement from Panorama to the firewall. Note that for both the 7000 series and 5200 series, logs are compressed during transmission.

 

        Log Forwarding Bandwidth

Log Rate (LPS)  Bandwidth Used
1300 8 Mbps

8000

56 Mbps
10000 64 Mbps
16000 52.8 - 140.8 Mbps (96.8) 

 

 

Log Forwarding Bandwidth - 7000 and 5200 Series

Log Rate (LPS)  Bandwidth Used
1300 .6 Mbps

8000

4 Mbps
10000 4.5 Mbps
16000 5 - 10 Mbps

 

 

 

 

 

Device Management

There are several factors to consider when choosing a platform for a Panorama deployment. Initial factors include:

  • Number of concurrent administrators need to be supported?
  • Does the Customer have VMWare virtualization infrastructure that the security team has access to?
  • Does the customer require dual power supplies?
  • What is the estimated configuration size?
  • Will the device handle log collection as well?

 

Panorama Virtual Appliance

This platform operates as a virtual M-100 and shares the same log ingestion rate. Adding additional resources will allow the virtual Panorama appliance to scale both it's ingestion rate as well as management capabilities. The minimum requirements for a Panorama virtual appliance running 8.0 is 8 vCPUs and 16GB vRAM.

 

 

 

 

 

When to choose Virtual Appliance?

  • The customer has large VMWare Infrastructure that the security has access to
  • Customer is using dedicated log collectors and are not in mixed mode

When not to choose Virtual Appliance?

  • Server team and Security team are separate and do not want to share
  • Customer has no virtual infrastructure

 

M-100 Hardware Platform

This platform has dedicated hardware and can handle up to concurrent 15 administrators. When in mixed mode, is capable of ingesting 10,000 - 15,000 logs per second.

When to choose M-100?

  • The customer needs a dedicated platform, but is very price sensitive
  • Customer is using dedicated log collectors and are not in mixed mode but do not have VM infrastructure

When not to choose M-100?

  • If dual power supplies are required
  • Mixed mode with more than 10k log/s or more than 8TB required for log retention
  • Has more than 15 concurrent admins

 

M-500 Hardware Platform

This platform has the highest log ingestion rate, even when in mixed mode. The higher resource availability will handle larger configurations and more concurrent administrators (15-30). Offers dual power supplies, and has a strong growth roadmap.

When to choose M-500?

  • The customer needs a dedicated platform, and has a large or growing deployment
  • Customer is using dual mode with more than 10k log/s
  • Customer want to future proof their investments
  • Customer needs a dedicated appliance but has more than 15 concurrent admins
  • Requires dual power supplies

When not to choose M-500?

  • If the customer has VM first environment and does not need more than 48 TB of log storage
  • The customer is very price sensitive

 

High Availability

This section will address design considerations when planning for a high availability deployment. Panorama high availability is Active/Passive only and both appliances need to be fully licensed. There are two aspects to high availability when deploying the Panorama solution. These aspects are Device Management and Logging. The two aspects are closely related, but each has specific design and configuration requirements.

 

Device Management HA: The ability to retain device management capabilities upon the loss of a Panorama device (either an M-series or virtual appliance).

Logging HA or Log Redundancy: The ability to retain firewall logs upon the loss of a Panorama device (M-series only).

 

Device Management HA

When deploying the Panorama solution in a high availability design, many customers choose to place HA peers in separate physical locations. From a design perspective, there are two factors to consider when deploying a pair of Panorama appliances in a High Availability configuration. These concerns are network latency and throughput.

 

Network Latency

The latency of intervening network segments affects the control traffic between the HA members. HA related timers can be adjusted to the need of the customer deployment. The maximum recommended value is 1000 ms.

  • Preemption Hold Time: If the Preemptive option is enabled, the Preemption Hold Time is the amount of time the passive device will wait before taking the active role. In this case, both devices are up, and the timer applies to the device with the "Primary" priority.
  • Promotion Hold Time: The promotion hold timer specifies the interval that the Secondary device will wait before assuming the active rote. In this case, there has been a failure of the primary device and this timer applies to the Secondary device.
  • Hello Interval: This timer defines the number of milliseconds between Hello packets to the peer device. Hello packets are used to verify that the peer device is operational.
  • Heartbeat Interval: This timer defines the number of milliseconds between ICMP messages sent to the peer. Heartbeat packets are used to verify that the peer device is reachable.

Relation between network latency and Heartbeat interval

Because the heartbeat is used to determine reachability of the HA peer, the Heartbeat interval should be set higher than the latency of the link between the HA members.

 

HA Timer Presets

While customers can set their HA timers specifically to suit their environment, Panorama also has two sets of preconfigured timers that the customer can use. These presets cover a majority of customer deployments

 

Recommended:

Timer Setting
Preemption Hold TIme 1
Hello Interval 8000
Heartbeat Interval 2000
Monitor Fail Hold Up Time 0
Additional Master Hold Up Time 7000

 

Aggressive:

Timer Setting     
Preemption Hold TIme 500
Hello Interval 8000
Heartbeat Interval 1000
Monitor Fail Hold Up Time 0
Additional Master Hold Up Time  5000

 

 

Configuration Sync

 

 

                                                                        HA Sync Process

HA Config Sync

 

 

The HA sync process occurs on Panorama when a change is made to the configuration on one of the members in the HA pair. When a change is made and committed on the Active-Primary, it will send a send a message to the Active-Secondary that the configuration needs to be synchronized. The Active-Secondary will send back an acknowledgement that it is ready. The Active-Primary will then send the configuration to the Active-Secondary. The Active-Secondary will merge the configuration sent by the Active-Primary and enqueue a job to commit the changes. This process must complete within three minutes of the HA-Sync message being sent from the Active-Primary Panorama. The main concern is size of the configuration being sent and the effective throughput of the network segment(s) that separate the HA members.

 

 

Log Availability

The other piece of the Panorama High Availability solution is providing availability of logs in the event of a hardware failure. There are two methods for achieving this when using a log collector infrastructure (either dedicated or in mixed mode).

 

Log Redundancy

PAN-OS 7.0 and later include an explicit option to write each log to 2 log collectors in the log collector group. By enabling this option, a device sends it's log to it's primary log collector, which then replicates the log to another collector in the same group:

 

 

Log Redundancy

Log duplication ensures that there are two copies of any given log in the log collector group. This is a good option for customers who need to guarantee log availability at all times. Things to consider:

 

1. The replication only takes place within a log collector group.

2. The overall available storage space is halved (because each log is written twice).

3. Overall Log ingestion rate will be reduced by up to 50%.

  

Log Buffering

When firewalls require an acknowledgement from the Panorama platform that they are forwarding logs to. This means that in the event that the firewall's primary log collector becomes unavailable, the logs will be buffered and sent when the collector comes back online. There are two methods to buffer logs. The first method is to configure separate log collector groups for each log collector:

 

Log Buffering

 

 

 

In this situation, if Log Collector 1 goes down, Firewall A & Firewall B will each store their logs on their own local log partition until the collector is brought back up. The local log partition for the currently sold models are:

 

Model Log Partition Size (GB) 
PA-200 2.4
PA-500 125
PA-3000 Series    90
PA-5000 Series 88
M-Series 10

 

The second method is to place multiple log collectors into a group. In this scenario, the firewall can be configured with a priority list so if the primary log collector goes down, the second collector on the list will buffer the logs until the primary collector is returned to service. The primary disadvantage of this approach is that only 10 GB of space is allocated on a log collector to hold buffered logs from all forwarding devices.

 

In the architecture shown below, Firewall A & Firewall B are configured to send their logs to Log Collector 1 primarily, with Log Collector 2 as a backup. If Log Collector 1 becomes unreachable, the devices will send their logs to Log Collector 2 for buffering. Only the first 10GB of logs received by Log Collector 2 will be buffered. If Firewall A uses all 10 GB, Firewall B will not have any logs buffered.

 

 

Collector Group - No Log Redundancy

 Using The Sizing Worksheet

 

 

 The information that you will need includes desired retention period, average log rate, and log mix (ratio of traffic to threat logs). If you don't know the ratio of traffic to threat logs, a split of 65% traffic to 35% threat is a good representation of an internet gateway deployment.

 

 

Example Use Cases

 

Use Case 1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Use Case 2

 

Use Case 3

 

Use Case 4

 

 

Attachment
Attachment
Attachment
Attachment
Comments
by jpeters
on ‎03-01-2016 09:23 PM

This a great doc. Thank you!

by JennyGarner
on ‎04-18-2016 09:37 AM

They went through this at Ignite and the presenter mentioned you can see how much traffic you're getting in Panorama by adding it's serial number in the Mananaged Collectors section.  I tried that and I'm not seeing any data.  Has anyone else tried this?  I'm running Panorama as a virtual machine and need to check how much logging we're doing before I add anymore.

by Sai_Srivastava_Tumuluri
on ‎04-22-2016 08:37 AM

Would like to request informaition on mixed mode behavior to this document

by ibaxter
on ‎05-13-2016 04:42 AM

Excellent resource, thanks.

by murphyj
on ‎09-16-2016 07:34 AM
In the document it talks about log collectors, which type of system/software count as long collectors that Panorama can read from? Is it only Panorama and other PA logging devices or would it be able to query the data from a splunk log source?
by
on ‎09-16-2016 01:41 PM

@murphyj, When we are referring to a "Log Collector" we are referring to a Palo Alto Networks Log Collector, which means that this is either

a VM, M-100 or M-500

and either Panorama software or device setup as a Palo Alto Networks Log Collector.

 

I hope this clarifies a little.

-Joe

by jchitsaz
on ‎02-14-2017 10:49 AM

Looks like the log retention note needs to be changed on VM since 8.0:

"Mixed mode with more than 10k log/s or more than 8TB required for log retention"

 

I believe it increased to 24TB in 8.0.

Ask Questions Get Answers Join the Live Community