CloudStack Palo Alto Plugin Enhancement Opportunity

TyMiller · ‎10-02-2014

Hi All,

We are looking for developers involved in the CloudStack Palo Alto Plugin to add some required functionality. Assuming the price is right, we are happy to engage one of the developers to implement this for us to speed up the features that we need.

Palo Alto support instructed us to post this query here as this forum is monitored by developers at multiple companies that worked on the plugin. Apologies for the large post, but figured that you may as well have all of the information up front. Thanks in advance for your constructive responses.

Background:

We have built a data centre with PAN as the firewalls and CloudStack (well, CloudPlatform to be precise) as the provisioning mechanism to allow departments within the organisation to launch machines in various VLANs and self-manage their own firewall rules (as a strategic business decision). We want to use the CloudStack Palo Alto plugin (https://cwiki.apache.org/confluence/display/CLOUDSTACK/Palo+Alto+Firewall+Integration) so that users can self-manage the Palo Alto firewall rules.

The existing CloudStack Palo Alto plugin is limited to creating firewall rules to allow traffic from the internet to the VM instance (i.e. untrust-to-trust) for a specific VLAN. This is referred to in CloudStack world as "basic networking".

The CS PAN plugin functionality currently does not support creating a firewall rule to allow a VM instance in one VLAN talk to a VM instance in another VLAN (i.e. trust-to-trust), which is what we need working. This means that we can't offer self-service firewall rule management for a multi-tiered architecture, and the VMs can't connect to shared services such as Active Directory, which are key aspects of our architecture moving forward. This is referred to in CloudStack world as "advanced networking", which from what I can see, is currently stated to be outside of the scope for the plugin development.

Summary:

We would like one of the following options to be developed.

Option 1 (Cleanest and simplest option. Proven to work by manually altering the resulting Palo Alto security policies.):

Default Deny Policy:

- The firewall requires a default deny rule at the end between trust-to-trust zones, or the implicit action by the firewall should be a deny between trust-to-trust zones. No action required for this item if a default deny for trust-to-trust is assumed to be configured in the firewall, which is fine for us.

When creating a new VLAN:

- Currently the "isolate" security policy that is created by CloudStack denies trust-to-trust communications, except to the gateway. Now that the default trust-to-trust deny action is in place, the explicit deny is not required via the isolate security policy, and instead this security policy would change to be an explicit allow from trust to the gateway.

When creating an ACL for an instance:

- A new trust-to-trust security policy needs to be created with the destination IP address being the private IP address of the instance.

Pros and Cons:

This option avoids the need to alter the NAT policies since private routable IP addresses can be used for internal communications, and the order of security policies and NAT policies do not need altering. This option appears to be much simpler to implement, and has been tested to work manually on a Palo Alto by altering the rules created by CloudStack.

This option has the downside where hosts within the trust zone can't talk to the public IP address of an internal system in the untrust zone due to missing NAT policies, as detailed below.

Option 2 (Secondary option. Proven to work by manually altering the resulting Palo Alto security policies.):

When creating a new VLAN:

- To ensure that the instance static NAT policies are hit first, the VLAN NAT policies should be grouped at the bottom of the NAT policy set.

When an instance acquires a public IP address:

- Instance static NAT policies should be set with a source zone of "any" so that it is triggered when either trust or untrust systems hit the public IP of an instance.

- To ensure instance static NAT policies are hit first, the instance static NAT policies should be grouped at the top of the NAT policy set.

When creating an ACL for an instance:

- Instance security policies should be set with a source zone of "any" so that it is triggered when either trust or untrust systems hit the public IP of an instance.

- To ensure instance security policies are hit first, the instance security policies should be grouped at the top of the security policy set.

Pros and Cons:

This option requires all internal systems to communicate with the public IP address of an instance, rather than the internal IP address. This option also requires altering security policies and NAT policies, as well as the order of both the security policies and NAT policies. This option disables some of the zone security restrictions, but are enforced by the security policies by IP and port. On the plus side, for environments using DNS to access hosts (rather than IP addresses) then DNS entries for hosts that resolve to public IP addresses will work.

Option 3:

A combination of Option 1 and Option 2 so that both internal trust-to-trust communications is supported, as well as having the benefit of being able to access the public IP address of an instance from the trust zone if required. This option is likely to be the most complex to implement.

Detailed Breakdown:

The following is the current implementation flow of how CloudStack implements the security policies and NAT policies in the Palo Alto:

(1) Create the first vlan in CS

When creating our first vlan, the CS PAN plugin creates a security policy (Name: isolate_230) that explicitly denies inter-vlan (trust-to-trust) communications, except for traffic going to the gateway.

It also creates a default allow security policy (Name: policy_0_230) below this rule for outbound (trust-to-untrust) communications to allow the systems in this vlan to communicate to the internet:

Resulting security policies:

At this time, it also creates a source nat so that traffic leaving the vlan will be NATd to the public IP address allocated to that vlan:

Resulting NAT policies:

(2) Allocate an IP address to an instance (VM) in the first VLAN

We launch an instance into the first vlan. This does not make any changes to the PAN rules.

However, when we allocate an IP address to this instance, it adds a new static NAT policy below the existing vlan NAT policies so that inbound connections to the public IP address are NATd to the private IP address of the instance.

Resulting NAT policies:

(3) Add an ACL to the instance (VM) in the first VLAN

As a PoC, we now want to allow SSH connections from anywhere to this instance, so we add an ACL in the CS PAN plugin. This adds a new security policy (policy_22) below the existing vlan security policies that allows connections to port 22/tcp from 0.0.0.0/0 (anywhere), but restricts the connections so they must traverse the PAN zones via untrust-to-trust. This means that systems on the internet can SSH to the instance, but any instances in other vlans within CS will be denied SSH access because they traverse the PAN zones via trust-to-trust despite the entire internet being allowed to access it.

(4) Create a new vlan, start an instance, allocate a public IP, and create an ACL for the instance

So, if we repeat the above steps to setup a new CS vlan, launch an instance, allocate a public IP, and create an ACL to allow SSH from anywhere, CS will add new security policies and NAT policies below the existing security and NAT policies. We therefore end up with the following cumulative security policies and NAT policies:

Resulting Security Policies:

Resulting NAT Policies:

Issues and Solution Options:

As detailed previously, there are four basic issues that are created in the above workflow that prevent inter-vlan communications. The potential solutions have been grouped together in the options detailed above.

(1) Security Policy Zone Restrictions

When creating an ACL for an instance, the resulting PAN security policy restricts traffic via untrust-to-trust. This prevents trust-to-trust communications, which stops the inter-vlan traffic.

We can see two solutions for this:

a) A second duplicate security policy be created that instead specifies zones to be trust-to-trust. Ideally this rule would use the internal IP of the instance (which may avoid changing NAT rules below), or else access to the instance would be via its public IP address.

b) The resulting security policy source zone should be set to "any" so that both internal traffic and external traffic is permitted. This still allows the source IP to be set in a rule that enables users to limit access to specific systems either internally or externally. In this case, access to the instance would be via its public IP address.

(2) Security Policy Order

There are two sets of security policies created by CS; "vlan security policies" and "instance security policies".

CS currently adds security policies to the end of the security policy rule set. This isn't a problem with basic networking, but when the "Security Policy Zone Restrictions" solution above is implemented, the "isolate" security policy that denies inter-vlan communications conflicts with the "instance" security policy that specifically allows inter-vlan communications to the instance.

In order to fix this, we see two options:

a) the "isolate" security policy is modified to allow access from trust to the gateway. A default deny rule in the firewall will prevent other trust-to-trust communication from happening except for when it is explicitly allowed via a CS ACL / PAN security policy.

b) vlan security policies should be created at the bottom of the rule set, and instance security policies to be created at the top of the CS rule set. This ensures that more granular instance rules are hit first, before more general vlan rules are enforced.

(3) NAT Policy Zone Restrictions

The static NAT policy for the instance (i.e. the inbound NAT for the instance to map it to its private IP)

When allocating a public IP address for an instance, the resulting NAT policy restricts traffic via untrust-to-untrust. This is because CS currently assumes the connection is originating from the internet, rather than internally. This prevents the NAT policy from being applied for trust-to-untrust communications (i.e. instance 1 internal IP connecting to instance 2 public IP).

We can see two solutions for this:

a) A second duplicate NAT policy be created that instead specifies zones to be trust-to-untrust. In the "Security Policy Zone Restrictions" solution above, we stated that this change may be avoided if the trust-to-trust instance security policy used the internal IP of the instance, and hence avoided the need for a NAT.

b) The resulting NAT policy source zone should be set to "any" so that traffic originating internally or externally has the NAT applied. In this case, access to the instance would be via its public IP address.

(4) NAT Policy Order

There are two sets of security policies created by CS; "vlan NAT policies" and "instance NAT policies".

CS currently adds NAT policies to the end of the NAT policy rule set. This isn't a problem with basic networking, but when the "NAT Policy Zone Restrictions" solution above is implemented, the "vlan source NAT" policy that NATs outbound communications conflicts with the "instance static NAT" policy that maps the public IP address to the private IP of the instance.

In order to fix this, we see two options:

a) As detailed in section "NAT Policy Zone Restrictions" above, if the trust-to-trust instance security policy used the internal IP of the instance then there would be no need for a NAT, which means no NAT policy changes are required.

b) vlan NAT policies should be created at the bottom of the rule set, and instance NAT policies to be created at the top of the CS rule set. This ensures that more granular instance NAT policies are hit first, before more general vlan NAT policies are enforced.

Thanks

Thanks for taking the time to read this. I hope to hear from some interested parties very soon as we are keen to get this underway. If you would like to contact me offline, then please feel free to contact me at [email protected]

Regards,

Ty Miller

wstevens · ‎10-03-2014

Hey TyMiller

Thanks for reaching out via email and showing interest in this implementation. I am the developer who implemented this functionality. Thank you for the detailed use cases and the potential solutions.

Just to get started, I will try to clarify some additional details.

- Everything we configure needs to be possible through the CS UI (or API). This means we need to understand what the expected flow is in CS for the desired functionality.

- What you are referring to as the 'Default Deny Policy' is the 'Default Egress Policy' in CS. This default policy can be either a default of 'allow' or 'deny' depending on how the network is setup, so we have to be able to handle both cases.

- CS distinguishes between Egress (leaving the network) and Ingress (entering the network) ACL policies, so we need to consider this when defining a solution.

- CS does not support the ordering of ACL policies (PAN does). There are a couple ACL policies which will always be ordered in my implementation. The 'isolation' policy will always be the first policy for a specific network. The default egress policy will always be the last egress policy for a network.

- Ordering of policies on the PAN is VERY hard with this implementation because CS does not respect ordering, so it can add policies in any order once things are initially setup, so you can never guarantee the order of policies on the PAN. The way I am guaranteeing the order of those two policies is by making sure that the 'isolation' policy is ALWAYS the first policy added (and every additional policy is added after by default). The way I am ensuring that the default egress policy is always the last egress policy is by watching for new egress policies. If a new egress policy comes in, I check if there is a default egress policy, if there is I save it into memory and delete it from the PAN. Then I add the new egress policy and then re-add the default egress policy after it. This works because it is a single policy so every time I want to add a new egress policy I can make sure the default egress policy is deleted and re-added after it. More complex ordering will be almost impossible...

- When you acquire a new IP address for VM, it does not always have to be implemented as a static NAT rule. A VM can get a public IP address through either a static NAT rule or a destination NAT rule (also called a Port Forwarding rule). It can also get a public IP through a Load Balancing rule, but that is not implemented on the PAN.

Ok, on to some of the finer points...

- My current implementation assumes that an Egress policy will go from the trust-to-untrust. It is probably a good idea for me to modify this behavior to go from trust-to-any so that an egress policy could have a private destination as well as a public destination.

- Likewise, my current implementation assumes that an Ingress policy will go from untrust-to-trust. It is probably a good idea for me to modify this behavior so it can do any-to-trust so that an ingress policy can origin from a private source as well as a public source.

- The 'isolation' policy is required because every private vlan is implemented on the same physical interface. Without that policy then all the different vlans could just talk to each other unhindered because they are all theoretically physically connected.

- My understanding is that two vlans can currently talk to each other, but the traffic has to go through the public network on both sides. So it has to go from trust-to-untrust and then from untrust-to-trust. So you would have to setup static NAT on both sides and then connect through the public static NAT IP addresses. I could be wrong, but I thought I had tested that (it was a long time ago)...

Here is my initial thoughts on a potential solution (very theoretical at this point, tbd if it is possible)...

I think it is VERY important that inter-vlan functionality is approved by the owners of both vlans because it would be a huge security risk otherwise. If one vlan can setup access to another vlan and get access without the other side having to agree to this connection this would be very bad. Consider that in most CS use cases these different vlans are owned by different accounts/companies, so it is usually important that they CAN NOT talk to each other. Because of this policies would have to be created on both sides for the connection to be able to work.

- Modify the implementation of Egress policies to go from trust-to-any

- Modify the implementation of Ingress policies to go from any-to-trust

- Setup an Egress policy on vlan1 to allow traffic from a portion of its CIDR out on specific ports. This assumes a 'deny' default. If setup with an 'allow' default I don't think this is required.

- Setup an Ingress policy on vlan1 to allow traffic from a portion (or all) of vlan2's CIDR on some specific ports.

- Setup an Egress policy on vlan2 to allow traffic from a portion of its CIDR out on specific ports. This assumes a 'deny' default. If setup with an 'allow' default I don't think this is required.

- Setup an Ingress policy on vlan2 to allow traffic from a portion (or all) of vlan1's CIDR on some specific ports.

- When the Ingress policies are made, I will need to check if the source CIDR is public or private. If it is private, I will have to update the 'isolation' rule to allow that vlan to talk to its own gateway AS WELL as the private range specified by the Ingress rule. This will be done on both vlans because the Ingress rules would be created in both vlans.

I believe this would extend the functionality to cover your use case. Yes???

If you have a lab setup and are able to test this scenario to see if the networking actually does work how I expect it to that would be awesome.

I will still need to verify that CS will let us to specify an ingress rule with a different vlan's private cidr, but if it doesn't I might be able to extend the functionality to add this support as well.

Let me know what you think of this proposed solution. If you want to discuss more you can get in touch with me over Linked In (I accepted your invitation earlier today).

Cheers,

Will

TyMiller · ‎10-07-2014

Hi Will,

Thanks so much for the detailed response.

I have passed your solution onto our team to manually test out. As soon as they have provided me with the analysis, I will let you know any tweaks that need making.

Thanks again,
Ty

Unlock your full community experience!

CloudStack Palo Alto Plugin Enhancement Opportunity

CloudStack Palo Alto Plugin Enhancement Opportunity

Show your appreciation!