One day in production with 4.1.3

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements

Content translations are temporarily unavailable due to site maintenance. We apologize for any inconvenience.

One day in production with 4.1.3

Not applicable

Alright, this is basically a rant..

I installed a PA-500 at a site this last weekend, replacing a Cisco ASA 5520.  We are using this site as our test bed for the 4.1 codebase before rolling it out company-wide.  To be honest, I'm in no hurry to move from 3.1, but we ran into a problem where 3.1.10 would not properly translate H.323 traffic through NAT, and I (the network administrator) can't log in remotely with the SSL VPN because it fails to assign a private IP address on both my Windows 7 laptop and XP desktop at home.  (I suspect this is because, by sheer coincidence, my subnet is the same as the company's -- 10.0.0.0/8.  Yes, it's atrocious overkill for a home network, but now I think of it as a stress test for VPN software.)

4.1.3 fixed the NAT problem, and I'm hoping to have more luck with the new Global Protect client.  But, I traded those two known evils for a host of new ones...

For one -- OK, this isn't specific to 4.1 actually -- there is no support for DHCP options in the DHCP server.  I was able to fall back to using a Juniper switch to serve DHCP, but I'm surprised to see such a glaring omission.  VoIP isn't new anymore.  Or even booting via PXE.  The onus is on me for not noticing this earlier, but it never occurred to me that it wouldn't be supported.  I searched other menus thinking I had to be missing something obvious, but nope.  It's just not there.

Then, more trouble with commits.  It's still hit and miss whether the commit will succeed.  This is completely unacceptable from a production device.  I would expect this in a beta, but after so many 4.0 releases, then to 4.1, and now being at 4.1.3, why isn't this ironed out yet?  Commits shouldn't fail.  Ever.  Rejected, sure.  But not fail.  I've been told 4.1.4 should fix this, but if you look at the release history, commit failures are fixed in every release, so we'll see.

On top of that, I managed to crash the management engine.  On Monday morning, I realized I had forgotten to turn off the DHCP server on the VoIP VLAN -- the phone switch usually handles this, but I had it enabled on the PAN for testing.  I make this one simple change.  Sure enough, commit hangs at 70% via the web GUI.  I try the CLI and I'm told a commit is in progress after several minutes.  Finally, I'm able to try again.  Commit shows progress (.....50%...70%...) then get a line and a half full of dots, and commit failed: timed out.  The status light on the front is lit yellow, and the next attempt closes my SSH window as if I had logged out. 

So, 30 minutes into its first business day in the rack and I'm sending a notification to the entire office that the network will be down for a reboot.  I'm not impressed.  I've been petitioning management to buy PA-200s for the satellite offices...  I'm wondering how convincing an argument I have now.

11 REPLIES 11

Not applicable

I've been with the PAN-5050's for over a year and still waiting for the "STABLE" production release of code.

L4 Transporter

On the DHCP options configuration, this is something we have heard from other customers and something we should add to the product. As you have obviously recognized, it isn't there today. We do have a feature request in our system for this and I added you to the list of requesting customers. Can't communicate specific timelines at this point.

On the other issues, it looks like we will need to take a look at the logs and see if we can figure out what is going on. To continue my role as Captain Obvious, you shouldn't be running in to these issues. It looks like there may be something particular to your config that is triggering a bug. Can you shoot in a support case with a tech support file that we can analyze?

Mike

Thanks Mike, I appreciate your reply.  Actually, I want to go on record and say the support team is, and has been, fantastic.  My frustration is targeted elsewhere.

I feel like the development team just bit off more than they could chew here.  I was told the official fix for the NAT issue is to upgrade to 4.1, which strikes me as a dead-wrong approach.  If 3.1 is supported, any deficiencies in that line should be fixed, in that line.  I also don't think 4.x was ready to see the light of day.  It's very ambitious, and it's getting better.  The new features are nice, but new features are second to stability.  I imagine the resources of Palo Alto aren't quite to the Cisco or even Juniper level, and that's fine.  Maybe developing two simultaneous code trains is just not the way to go right now.  I hope the decision makers learn this, and soon, as Palo Alto is very close to having an ideal product.

One last thing, and I think I've said all that needs to be said:  End users nowadays expect networks to be up at all times.  I've never gone in on a weekend and been the only person there.  I don't think a time exists where I can sneak in and deal with problems when someone won't notice.  Because of that, I'm held to task when something fails (outside of established maintenance windows), and I expect the same from our appliance vendors.  The PA-500s and 2020s that we have are priced at the level where this is a reasonable expectation.

The hardware is capable of it... I know, because in over a year, the two problems I noted above are about all the trouble I had while running 3.x.  Unfortunately, the IT director is chomping at the bit to kill off our datacenter ASAs which currently do nothing more than serve as secondary user VPN endpoints.  He wants 4.x rolled out now so we can try our hand at the Global Protect client, but I'm not about to bet my network on that when we have a solid, proven solution already deployed.  But, that's power, support agreements, and rack space that we're hanging on to.

Could you throw me in aswell into this watchlist regarding DHCP options as a feature request?

If im not mistaken I was whining about this 2-3 years ago along with per vlan DHCP server (the idea with the later is to terminate VLANs into the PAN and depending on which VLAN the client sits on it will get different ip-address along with defgw ip etc - however handling DHCP options would solve this on its own, such as option82 from the dhcpsnooping in the access-switches).

Two simultaneous code trains? Try to make it three now when PANOS 5.0 seems to be scheduled for release in Q2 2012...

Also your observations makes me wonder... what does PA use for themselfs? I mean Im pretty sure that these bugs must have been available for PA's internal network aswell (given that PAN devices are being used there and that your bugs are not RMA related) - or does PA internally use some other codepath or even hardware?

Some time ago there were some laughs regarding that the Microsoft ISA team (Microsoft "firewall") didnt trust their own product because they used Freebsd and the firewall in there to protect their own dev-networks instead of using the ISA.

mikand: Done.

nwallette:

H323 support for NAT was a new feature that was added in PAN-OS 4.1. It wasn't a fix that was added to PAN-OS 4.1 is was a new ALG added to deal with H323 decoding and translation. BTW, SIP support for NAT was added in PAN-OS 3.1.

As far as quality and stability, your comments are spot on. We do believe we have an ability to innovate in ways that other security vendors cannot but we need to be careful about balancing that with quality and stability. We take this challenge seriously and believe we have implemented processes that will ensure we deserve the trust we have earned. Hopefully the result is a company that can continue to deliver more innovation with better quality and stabililty than anything else on the market.

Mike

We absolutely use our own firewalls in all areas of our network. We have a mix of hardware models and software versions running in our production environments.

Mike

@Mikand:

Unless I misunderstand your configuration, you can set up DHCP servers assigned to VLAN interfaces.  I had done it this way myself until I ran into the DHCP options thing.  Maybe not in a Layer-2 deployment, if that's what you're thinking of, but certainly Layer-3.

I totally agree with you on ISA.  I seriously question the sanity of anyone who thought it would be a good idea to use an OS as a firewall that stands the most to gain from being behind a firewall.  I wouldn't put that on my perimeter either if I were them.

For what it's worth, the network engine (packet forwarding, routing, whatever) has been rock-solid.  I have no qualms whatsoever with that.  In fact, there's nothing I know of that I'd rather have protecting my network edge.  In all honesty, I'm a big fan.  That's why I'm here complaining about these problems.  There's no malicious intent, and I have no desire to start a riot.  I just want to call attention to these issues, have them taken seriously, and see them fixed so there is an affordable, quality product on the market.  We all need that.

@Mike:

Thanks again for your reply.  I understand there's probably no one that will or can talk about why H.323 can't be implemented in 3.1.  However, the ALG framework seems to exist, and is extensible enough, that functionally similar protocols (specifically SIP, but also FTP for example) can be inspected and have their payloads re-written as necessary.  I have trouble comprehending how the platform can be designed such that the application can be identified, and threats targeted, and be continually updated with new and more specific rulesets, but a video call setup packet is over the line.

I don't expect this to be solved, I'm just disappointed and skeptical that it wasn't a decision from within that "4.0 is our new standard, this is where we're focusing our efforts from now on."  I suspect the resources are just spread too thin.  At least we're not asked to purchase the upgrade, and be "sold" this new feature -- provided we're current on support.

I'm also personally glad SIP is being taken care of, because we're architecting an enterprise-wide VoIP roll-out and it would be quite the fiasco to have it fail because of incompatibility with our firewalls.

I know a real DHCP server can solve this however the point here was to avoid using external devices for the particular network and use the built in DHCP server of the PAN device instead (basically to save some money since this was a best effort network and not really prioritized).

In the particular case not all our networks had switches capable of dhcpsnooping so we couldnt rely on option82. However the switches who could do this would have dhcpsnooping enabled and the sideeffect of using PAN for DHCP in this case was also to get the option82 logs in the same view as srcip.

This way in order to track down a particular client you would only have to go to the ACC of the PAN and voila - everything is there (srcip, option82 (meaning physical interface in your access-network) etc).

When you use a real DHCP server you would need to match the logs between the PAN and the DHCP server (not to mention that this DHCP server also costs some money to buy even if you use ISC DHCP and Linux the hardware still needs to be aquired).

L2 Linker

Whilst I cant help with the majority of your problems I can tell you that the Global Protect client appears to be able to work out when the address it allocates conflicts with the local addressing (as in the case where both the local & remote nets use 10.x.x.x) and will try an alternative address provided you have allocated another range in the gateway - so I have our gateways setup with a 10.x.x.x address range and a 172.x.x.x address range, which seems to work. We are increasingly seeing the 10.x.x.x addresses being allocated by hotspots & wireless carriers in the UK, so its definately something to test !

I appreciate the idea, but that wouldn't work in our network.

We have two datacenters that act like a kind of hub-and-spoke topology.  Many, many sites connect in and have routed networks that need to communicate with the two DCs.  Our entire network is in 10.0.0.0/8, with subnets carved out for the various locations.  We use route summarization a lot to simplify routing tables, since we have a mixed network that won't support OSPF end-to-end, and thus, for now, use static routes.  They would be much more complicated if we couldn't rely on each site address pool being a supernet-able block.

Beyond that, we already use other private IP spaces for various things -- like 192.168 for site guest networks that can't route back to the DCs; and 172.16 networks to create VPNs to partner sites when a contract requires it, etc.  So, using other spaces that need to be routable throughout the network would really throw a wrench into the works.

Foremost, I've never had a problem with the Cisco IPsec client assigning an address on a client with overlapping networks, so it's obviously not a technical impossibility, it's just something the Palo Alto client software wasn't designed to handle.

  • 5984 Views
  • 11 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!