What to do with Large Logfiles

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

What to do with Large Logfiles

L4 Transporter

I have two PA4060s and Panorama on our internet border.  I need to retain logs of all outbound traffic for at least three months.  After watching the log retention on Panorama for a couple of weeks and running the debug log-receiver statistics command, doing calculations from it, and by watching the amount of data stored and knowing the size of my disk, I have two separate estimates which give me a rough idea of how much data to expect, which equates to between 50GB and 80GB per day.  Multiplying that up for three months means the database will grow to between 4TB and 6TB.

My question is does anyone have any experience of log databases that size?  Will they perform OK or get too slow over a certain size?

I am also concerned that I have to keep 3 month's data and leaving it to a bunch of estimates when the database automatically overwrites itself doesn't seem a good legal defence so how can I be 100% sure I have three months?

I have had a few ideas but they are all rather complicated and unwieldy, such as exporting a week's worth of data every week to another server, storing it in batches and interrogating the relevant week's database if I need to using another copy of panorama.

Any ideas?

4 REPLIES 4

L7 Applicator

I don't actually have numbers on that large of a database, but can give some guidance on how to achieve your legal requirement.

I'd recommend that in addition to sending your logs to Panorama that you also send them to a syslog server next to the border firewall. Those can be kept as long as your syslog server has space, and doesn't affect the logs being uploaded to Panorama. Even if you wrap logs sooner, you can refer to the raw syslogs if needed. Here's a doc on setting it up:

Hope this helps!

Greg

Thanks Greg, we presently do this with our old Juniper firewalls and I had considered as a security measure just exporting the logs to CSV on another server each night because it does the same thing but puts less load on the network and the servers, but the problem with the CSV files is that I lose the ability to run a User Activity report and all the other funky reasons I bought the PA's for.

So yes as a legal security measure I think that's probably my favourite solution because it also means I don't rely on PA software to interpret the data - it's human readable.

Personally I would set it up something like:

1) Get one M100 device with 8x1TB (4TB effective after raiding) to which you push Panorama logs. This way each PA box probably just holds a few days/weeks while the Panorama box will retain for a couple/several of months.

Management Platforms

1.5) File a feature request to your SE that PA should use 4TB drives instead of 1TB to maximize amount of storage (specially since 4TB drives are fairly cheap nowadays).

2) At the same time push syslog (CONFIG, SYSTEM, THREAT, TRAFFIC and HIPMATCH) from each PA-box to a syslogserver of your choice. If you have the money perhaps Splunk or even Arcsight. Otherwise (since this is just for archiving) setup an Ubuntu Minimal running syslog-ng or rsyslog (whatever you prefer) which will rotate the logs every hour or if you have plenty of logs every 5 minutes or so (one log per logging host) - during rotation the rotated log should be compressed with "gzip -9". Preferly use 4TB drives in RAID1 (or if you have plenty of load 4TB drives in RAID10 configuration - SATA drives are cheap nowadays (at least compared to what a single PA box costs 😃

3) Get a NAS to handle the archiving. Since the rotated logs are compressed by "gzip -9" you wont need to extract them in order to search through them. Just use zcat, zgrep etc. Compression will also make the logs to take less space.

Which NAS you choose is a matter of taste by Synology seems to do decent hardware (and software) in this area, for example a Rackstation RS2212RP+ which can take 10x4TB drives which in a RAID6 configuration means 32TB of effective storage.

4) Bonus: Perhaps some kind of offline backup aswell or if you setup a NAS like above on a different geographical location.

This way PA-boxes/Panorama appliance will be your 1st level of logdigging/reporting. While the syslogserver will be your 2nd level. Since you log into syslog you wont suffer from when PA in future might change db formats etc (if you have used Juniper NSM back in the days you probably know what Im thinking of - you had plenty of archives but the day you needed to read those archives back in NSM it turns out that the NSM update performed a few months earlier changed the db schema so the old archives cannot be read back - so now you need a new hardware to install an old NSM at and Juniper demands another license fee for that second NSM box and so on) since the syslog is just cleartext and in this case also compressed textfiles which can easily be searched through and displayed by using the zcat and zgrep tools.

The above gives that M100 and the syslogserver will be the hot sources (seconds to minutes to retrieve data) while the NAS will be the cold source (minutes to hours). And the backup at (preferly) other geographical location will be an offline source (hours to days).

L4 Transporter

We log about 15G per day.  We forward the log traffic off to a syslog server and rotate the logs hourly.  We then compress them into 1 folder per day.  The file compression allows us to keep 6 months of logs easily on a 1 TB drive.  We use Panorama (2TB) to carry 3-4 months of data for operational purposes.

  • 2749 Views
  • 4 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!