Automating Malware Scanning for Files Uploaded to Cloud Storage With WildFire API

salsop · ‎04-28-2022

With more organizations embracing digital systems to process information from their customers, partners, and employees, the functionality of uploading files via websites and online applications are commonplace. An ever-increasing number of malicious files and filetypes must be reviewed and confirmed to not be malicious before being safely stored or used by the application to prevent security incidents caused by malicious files being processed.

This is one use-case for the WildFire API, allowing organizations to use a scalable and automated solution to confidently scan and confirm if the files are malicious or benign.

This tutorial shows how to implement an event-driven pipeline for automated malicious file detection of files uploaded to Google Cloud Storage using Palo Alto Networks' WildFire API.

What is Palo Alto Networks WildFire?

WildFire is at the forefront of security with native integrations to Palo Alto Networks products, such as the Next-Generation Firewalls, Cortex XDR, and other Palo Alto Networks solutions. With the WildFire API, security teams can now extend the advanced analysis and protections of WildFire to a growing number of use cases.

WildFire is the industry’s most advanced analysis and prevention engine for highly evasive zero-day exploits and malware. The service employs a unique multi-technique approach combining dynamic and static analysis, and innovative machine learning techniques, to detect and prevent even the most evasive threats.

The WildFire RESTful API enables organizations to extend malware detection capabilities beyond the typical control points. The API includes the ability to submit files and URL links for analysis, and query for known and new verdicts. Many organizations have already adopted this API to automate the submissions of files extending the protections to cloud storage and B2C portals.

Customers who adopt the WildFire API will benefit from the research of Unit 42, Palo Alto Threat Research teams, and the growing database of more than 16 billion malicious samples WildFire. WildFire is the largest cloud-based file analysis solution in the industry, analyzing submissions from more than 80,000 global customers. The analysis results are updated in real-time and often include detections for novel malware campaigns ahead of other cloud-based analysis solutions.

Obtaining a WildFire API Key

To use the Palo Alto Networks WildFire API, you must have a WildFire API key.

Palo Alto Networks now offers a subscription service enabling access to the advanced file analysis capabilities of the WildFire cloud for customers operating SOAR tools, custom security applications, and other threat assessment software through a RESTful, XML-based API. This standalone WildFire API subscription offering allows you to make queries to the WildFire cloud threat database for information about potentially malicious content, and submit files for analysis using the advanced threat analysis capabilities of WildFire, based on your organization’s specific requirements.

For any customers that do not currently have an existing WildFire service or any relationship with Palo Alto Networks, please contact our Sales team to discuss how to obtain an evaluation of this service. You can contact the sales team by following this link.

For existing customers of Palo Alto Networks using the WildFire service follow these steps to get your API key for the WildFire public cloud:

Log in to the WildFire portal.
Visit https://wildfire.paloaltonetworks.com/
Select Account on the navigation bar at the top of the page.
Your API key or keys appear under My WildFire API Keys.

Your account may have more than one WildFire API key. Choose one that is valid and has an Expiration that is in the future.

Architecture

To understand this file scanning pipeline it's important that you understand the following components:

Google Cloud Storage
Google Cloud Functions (Go Lang)
Google Cloud Secrets Manager
Palo Alto Networks WildFire API

The following steps outline the architectural pipeline:

The user uploads a file to Google Cloud Storage.
On file completion/update the “WildFire Verdict Checker” Google Cloud Function is triggered.
The “WildFire Verdict Checker” Cloud Function uses the MD5 hash of the file which is already pre-computed by Google Cloud Storage to check for a predetermined verdict for the file using a WildFire API request.
If a verdict is returned for the MD5 hash that is not “Benign” or “Malware” then the file is uploaded to the WildFire file analysis system via the API. Every 60 seconds after upload the system checks for an updated verdict for the file.
If the verdict is “Benign” the file is moved to the “Scanned Files” Google Cloud Storage Bucket. If the verdict is not “Benign” (i.e. Malicious) the file is moved to the “Quarantined Files” Google Cloud Storage Bucket.

Once the files are moved to the “Scanned Files” Cloud Storage Bucket the rest of the event-driven application logic can proceed on these files with the knowledge that they have been analyzed and have been found to be benign by the Palo Alto Networks WildFire service.

The security team can investigate the quarantined files as needed to understand the types of attacks that are being attempted on the organization, and better understand the threat landscape.

Setting up the environment

The following steps create and deploy this demonstration environment for use with the WildFire API.

Follow these steps to get the demonstration environment setup in Google Cloud:

Create a new Google Cloud Project for the deployment, and ensure it has a billing account associated with it, as some of the services used in this deployment are billable resources.
- Ensure that you understand the relevant charges associated with this deployment before proceeding. To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.
  
  When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see the “Clean up” section below.
Log on to Google Cloud Platform Console, select the target project for the deployment and launch Cloud Shell.
Ensure that the GCLOUD SDK is configured to use the correct project that you created in Step 1 by running the following command substituting “replace-this-project-id-with-yours” with the corresponding project id from your project.
- gcloud config set project replace-this-project-id-with-yours
Enable the relevant service APIs by running the following commands:
- gcloud services enable cloudfunctions.googleapis.com
  
  gcloud services enable cloudbuild.googleapis.com
  
  gcloud services enable secretmanager.googleapis.com
Clone the demonstration Github repository by running the following command:
- git clone https://github.com/paloaltonetworks/gcp-wildfire-api
Change into the downloaded repository folder
- cd gcp-wildfire-api
Ensure you have the WildFire API key ready for use.
Initialize Terraform
- terraform init
Check the Terraform Plan to confirm you are happy with the that Terraform intends to make to you Google Cloud Project
- terraform plan
When prompted enter your Project ID (from Step 1) and WildFire API Key (from Step 7).
Apply the changes that Terraform needs to make to deploy the demonstration environment.
- terraform apply
When prompted enter your Project ID (from Step 1) and WildFire API Key (from Step 7).
Terraform will then show the changes it will make, review the changes and enter “yes” to allow Terraform to proceed if you are happy with the suggested changes.
On completion, you will be shown the Terraform output with the Google storage bucket for uploading files to.
- Apply complete! Resources: 15 added, 0 changed, 0 destroyed.
  
  Outputs:
  
  gcs_bucket_for_upload = "gs://sponge-upload"
From here you can simply copy a file into the Google Cloud Storage Bucket using GSUTIL, or if you prefer you can use the Google Cloud Admin Console. Here is an example of the CLI command, replace `gs://sponge-upload` with the information for the Terraform output.
- gsutil cp upload_file.pdf gs://sponge-upload
Now you can check the Cloud Function logs and Google Cloud Storage Buckets. If the file is scanned and found to be “benign” it will be moved into the Google Cloud Storage Bucket ending “-scanned”. If found to me “Malicious” then the file will be moved in the Google Cloud Storage Bucket ending “-quarantined”.

Clean up

Once you have completed your testing please delete the deployment that has been created. To do this run the following command from the Cloud Console:

terraform destroy

You will be prompted to review the proposed changes, and accept them before proceeding, on completion will see this message:

Destroy complete! Resources: 15 destroyed.

Additional Information and Reading:

The following articles and information are useful for understanding more about the WildFire service:

A WildFire API key allows up to 2,500 sample uploads per day and up to 17,500 report queries per day. The daily limit resets at 23:59:00 UTC.
WildFire Privacy Datasheet - The purpose of this document is to provide customers of Palo Alto Networks with the information needed to assess the impact of this service on their overall privacy posture by detailing how personal information may be captured, processed and stored by and within the service.
WildFire API Resources - Documentation on the WildFire regional clouds, and API resources available.
Supported File Types - A table detailing the currently supported file types for analysis.

Deenadhayal · ‎08-04-2022

Can we use this solution for scanning of malware content in data files before ingesting into data warehouse for creating BI insights.
What is the maximum file size this solution supports ? Basically we are looking for an alternative to clamav since it supports only up to file size of 4GB.
Does this solution supports auto scaling to support n number of files based on demand.

WildFire

Automating Malware Scanning for Files Uploaded to Cloud Storage With WildFire API