Overloading 5220 with 9.0.x

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements

Overloading 5220 with 9.0.x

L4 Transporter

Hi

 

I update my firmware from 8.1.10 to 9.0.5

 

now I can bring my 5220 to its knees with my mailist run

So email consist of pdf attachment - approxy 3M.  but about 4K emails all around the same time

 

This wasn't a problem before on the 8.1.10 .. but on 9.0.5 cpu hits 100% and my latency through the box goes from <1ms to 2-3s+ which makes things crash 😞

 

I have put in a rule for my maillist server to no longer be content checked, but, I don't want to allow that for all email, I wouldn't mind ratelimiting it from the PA side of things, else somebody could crash my network by sending lots of email with large attachments to me !

 

Can I ratelimit 1 app or how can i get back to the same behaviour I had under 8.1.10

 

A

 

 

NOTE - sory original put in 5020 - fat finger mistake - 5220 

22 REPLIES 22

Cyber Elite
Cyber Elite

@Alex_Samad,

Whatever actually sends your email should be able to ratelimit the send rate no? If for some reason you need to do this on the firewall itself, you could essentially rate limit the connection by setting up a DoS Protection rule for the traffic and simply allow that to act as a rate-limiter. Whatever is sending the emails should continue to send the messages until they are all sent. 

Well no.. you should ask your users to protect yourself . the PA should be able to protect itself.

 

Plus I am not sure what level to rate limit to. is it the size of attachment or ???. So basically if you do smpt content checking you are leaving yourself open to DDOS. I noticed when i hit 100%, all of my BGP and OSPF connections went down - why BFD

Latency went from <1ms to 3s+

 

I will spell out the situation a bit better (have to so I can send to SE and support).

 

The event 

Symptons

- 100% cpu usage

- BFD failure

- Latency > 3000ms

 

I believe the later 2 are a result of the 1st symptom.

 

The environment 2x 5020 in A/P cluster  with 9.0.5. Note 8.1.10 handled this fine. so and I am not sure what change in 9.0 is causing this ! I use 3 x 40G LACP as the main trunk, with 2x10 LACP for ISP connectivity - on 1G lines. 

 

What causes this. 

SMTP traffic. 4000+ emails sent as fast as possible.

The emails are plain text with a 3M PDF.  The exact same PDF

 

What I think is happening is the PA is looking at each PDF and checking it.

I am guessing this is the extra content / APP ID stuff they are talking about.

 

But it doesn't seem to be very clear, recheck the same PDF 4k+ times seems a bit silly.

 

How do you protect yourself from this.  Seems like the protection / content checking is not limited.  If I could say only use 70% for this and just slow down that traffic that would be awesome

 

Even if I could rate limit the SMTP traffic - what do I ratelimit it to how do you work out how long the virus .. etc checking is going to take.

 

For now I have turned off contect checking - we will see this arvo if that is the issue.

 

 

 

 

 

 

 

@Alex_Samad,

So nothing on-site (Exchange/PostFix) is actually sending the emails, you are sending to an external service? If that's the case then you're right, you can't rate limit the send-out outside of notifying users that they can't send that many messages at once. If you were hosting your own email server rate-limiting on the server side is a common and dead-simple process.

 

As for your assumption that Threat Prevention is actively scanning the email and attached PDF; yes, that's exactly how it's supposed to work. Unlike WildFire, Threat Prevention doesn't care that the file hash has already been checked because it doesn't take hash values into account at all. Threat Prevention scans each of the PDFs through its signatures and ensures that the content doesn't violate anything each time. To the best of my knowledge there isn't a way to dynamically disable threat-prevention because your CPU is pegged.

 

 

 

Hi

 

No.. thats an assumption 🙂

 

for this event, my prod servers in my network send emails. But this is my edge SMTP server so internet connections potentialy come to this. I don't believe asking the dev guys to rate limit is the right response I have to trust somebody else to protect the firewall .. no no.

 

THats not good. you can bring a PA to its knees by sending this through.

 

Do you know what is different in 8.1.x and 9.0.x because i didn't have this issue before with 8.1

 

A

 

 

What would you recommend as a good protection profile

 

I have 

anitvirus

anti spy ware

vulnerability protect

URL filter

Wildfire

 

@Alex_Samad,

So sending this many messages is something you rate limit through your email servers so you don't surpass your networks (specifically the firewall in this case) capability. Regardless of the fact that 9.0 presented the issue, you now know that sending that much data at once is limiting the performance of the rest of your network. That means you find some way to limit that traffic to an amount that your network can actively handle; that could be limiting it on your prod servers or on your Edge servers. That's pretty much the only way you can fix it currently short of disabling content inspection on that traffic, which is something you've already stated you don't want to do. A DoS policy on the firewall can act as a rate-limit from your prod email servers sending messages to the edge server, but since you have control over the servers themselves that isn't necessary.

Keep in mind that you may be sending 4k emails at the same time, but depending on how the security rules between your zones are configured the firewall is analyzing each message twice. Once when it transfers from your prod servers to the edge SMTP server, and again when your edge SMTP server sends the message outside. Potentially you could limit the amount of content inspection by only inspecting the traffic once? 

 

As far as receiving this many messages and it causing stress through your firewall, this is exactly what Denial of Service profiles and Denial of Service Prevention rules are designed to protect against; you should have any public service you have exposed already configured with one (Many people don't use this feature, that's a big mistake). SMTP is one of the best services in regards to DoS, because if you drop the connection the other end will simply attempt to send the message again. 

 

Do you know what is different in 8.1.x and 9.0.x because i didn't have this issue before with 8.1

This actually could be a lot of things. 9.0 itself adjusts some of the content inspection process, but you would also get access to a bunch of new signatures that otherwise wouldn't be getting analyzed because they wouldn't have been active until you upgraded to 9.0. You'll need to work with TAC so that they have access to all of the appropriate logs and PCAPs to see whats actually happening here.  

Hi

 

So again - i think you are fixing symptoms. When its the problem that should be fixed. The fact that an unknown amount of emails or threat detection can cause the CPU to hit 100% is bad, especially as you can't cap how much cpu threat protection can utilise. << This by its self it very bad design I think. 

 

I would presume there is a pipe line to process this - there should be a way to limit it and this would in effect limit SMTP traffic through put.

 

Yuo talk about DOS profile, what am I going to profile. SMTP can be multi email per TCP connectoin - so can't do it by SYN packets.  Can you do it by size - no you can't tell if its one TCP connection weather is a big email or lots of little emails. Maybe you can expand on thise - happy to do something - but I am not sure 

 

I'm happy that its only getting checked once - did a quick check fairly happy 🙂

I have tried TAC, its been a bit slow there so I thought I would raise it here as well (also recommended by my SE)

 

 

So 

* Don't want to rate limit my prod mailing list .. I have added a rule to allow it through without checking (tick)

* I don't check my outbound SMPT (me to the internet) - Turned this off  - don't want to check and I can't allow the prod mail through and check the other with this test. as its the main SMTP servers out. - I can live with this for now

* rest of my internal network to SMPT - > I want to check this - stop virus etc .. but if somebody starts to spam the PA it will bring it to its knees <<< this I need help on (I can probably ratelimit at the postfix level.

* Internet inbound to me.. I'm okay with this I only allow specific servers to me so I'm happy and I can ratelimit it at the postfix level

 

TO DO is the smtp checking of internal source addressed email which are not from the prod servers

How to do on the PA - how to self protect.

Zone protection .. based on SYN packets and connection rate .. thats not going to work as pointed out above . what values ?

 

QOS .. nope , my main trunc is a LACP trunk with all of the vlans off there and from memory that doesn't work the way I want to. plus I only want to rate limit my SMTP..  I could create a new interface just for my DMZ - maybe i should and apply ratelimiting there .. 

Looking at the interface though its by class of traffic ..... 

 

DOS protection profile .. again syn packets... if the 4k come over 10 long lives TCP connections - not going to help.

 

 

There is not much i can do to self protect.

 

On my arista switches I can rate limit events that go to the CPU .. so to translate i can say .. threat prevention - is allowed to have max 80% of cpu << this would be a nice feature - this i see as self protection.

 

I will look at postfix - if you can help me with the DOS protection profile incase I have missed anything !

 

thanks

 

 

 

 

 

 

 

 

 

@Alex_Samad,

I agree the problem should be fixed, and to do so you need insight into what exactly is going on. To enable that level of logging, you need to work with TAC and get past level 1 so they actually know what debug logging needs to be enabled. I should have made it more clear in my earlier posts, I am only looking to deal with the symptoms of the problem; the only people who can get the logs needed to fix the core issue is TAC.

 

Yuo talk about DOS profile, what am I going to profile. SMTP can be multi email per TCP connectoin - so can't do it by SYN packets.  Can you do it by size - no you can't tell if its one TCP connection weather is a big email or lots of little emails. Maybe you can expand on thise - happy to do something - but I am not sure 

You can limit both tcp-syn and set a session max-concurrent-limit and rate-limit connections to a very appropriate level. I think you're thinking too much of what someone could do to bypass your limits and what the vast majority of attacks will do. If I knew your host I could overflow your systems if I wanted to, end of story no way you can defend against it. 

 

QOS .. nope , my main trunc is a LACP trunk with all of the vlans off there and from memory that doesn't work the way I want to. plus I only want to rate limit my SMTP..  I could create a new interface just for my DMZ - maybe i should and apply ratelimiting there .. 

Looking at the interface though its by class of traffic ..... 

Take a more detailed look at QoS configuration on the firewall, you can leave everything assigned to 3 (default) and assign just SMTP traffic a different class and set max as necessary.

 

 

Hi

 

Yes, some thing that probably needs to be addressed some how.

TAC - yep waiting on them ... 3 days so far.

 

"

You can limit both tcp-syn and set a session max-concurrent-limit and rate-limit connections to a very appropriate level. I think you're thinking too much of what someone could do to bypass your limits and what the vast majority of attacks will do. If I knew your host I could overflow your systems if I wanted to, end of story no way you can defend against it. 

"

I have to disagree, because you don't know the characteristic of what is going to set off the 100% cpu usage.  You can't cound SYN / Connections / Session as SMTP allow for multiple message to be sent over 1 TCP session.

 

Size .. how can you tell if it 4k of 1k mailes or 1k of 4k emails with attachments .. so if you do it by size then you are going to kill valild connections.

So let me correct / expand on this. I could set it to say limit to 5 connections (concurrent) and 1.2 * max attachment size. This would / should make sure I don't run into CPU issues. But then I am not getting the most out of my system.

 

Thanks

 

May I ask how you managed to update a PA 5020 to PAN-OS 9.0?

We were told PAN-OS 9.0 is not supported on PA 5000 series.

apologies 5220

@Alex_Samad don't get me wrong because of this question but as this somehow seems like a bug of 9.0, what about a downgrade to 8.1? Or did you upgrade to 9.0 because of a new feature that you need?

I agree with you that the firewall should not reach 100% cpu usage because of such a connection, but I mean also a 5220 has it's limit an when this is reached - I know this is not the situation you have - you have to enable some protections - as described by @BPry - on the surounding systems.

 

And the this of course this does not help you either, but do nut upgrade to a new major version prior to x.y.7 (or better x.y.8)

Hi

 

I am not sure if its a bug.

 

Why the move to 9.0  . thats the PA recommendation

Why 9.0.5 Thats the recomemnded release 

 

I would hope that PA wouldn't recommend things that don't work !

 

The protection should be, in my opinion, that you can apply caps to CPU limits for type or process on the PA.

 

Threat prevention shouldn't take up 100% over and above packet routing .. which is what happened.

 

Yes you can try and limit traffic / data to / through the PA but that is fraught with danger - again don't fix the symptoms fix the issue.

 

 

Any way I am happy - sort of - with my change - I have remove content checking ... I would like it on but I can live with out it for now

 

 

 

L4 Transporter

So ran into this problem again . but on my PA850 - much smaller PA.

 

So users connecting to file server.

 

Normally the PA's id's traffic as ms-ds-smbv3 no problem it doesn't get checked so no issue.

But the last one that went through was ID'ed as rss. not sure how that worked and it ifs under my catch all.

 

It was 27G session

 

so it tanked my CPU - stoped OSPF deamon replying to hello's. 

 

basically killed the Firewall and brought down the network in the office

 

To me this is a Bug - rather big one ... not being able to limit the amount of cpu time that is allocated to threat protection

 

 

  • 9174 Views
  • 22 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!