Best Practices for scripting RQL queries

Showing results for 
Show  only  | Search instead for 
Did you mean: 
L1 Bithead
No ratings

By Paul Burega, CSE Team Lead


Adam Hamilton-Sutherland, Customer Success Engineer




Prisma Cloud collects data about cloud resources in your cloud accounts and allows extracting information about those cloud resources such that answers to common security questions can be answered, such as show me ec2 volumes that are not encrypted.   These queries are written in Resource Query Language (RQL), and can be debugged and run on the Investigate page in Prisma Cloud. 


  1. Allows interactively running a query and manually examining or downloading the results. 

  2. The user interface currently shows the first 100 results, a “load more” button will retrieve the next 100 results.  

  3.  The download option is limited to a maximum of 100,000 results. 


While a great environment for debugging and validating queries with a small number of results, this environment does not easily lend itself to batch operation and integration with other security tools in the enterprise. A larger tenant with many cloud accounts and cloud resources may have tens of thousands of results or more and may require more than 100,000 results.


This LIVEcommunity article demonstrates the use of several Python libraries that are provided by Prisma Cloud to overcome the above two limitations.  This requires the use of a command line interpreter in your environment (Linux, MacOS, or Windows with Windows Subsystem for Linux).


The first library is designed to create a CSV format file just like the download results, but allowing more results to be downloaded. The second library is designed to give you back the JSON formatted data for the query results, allowing you to pass this information on to other applications. 

1.0 The PCPI Python Package 


Palo Alto Networks provides a GitHub library: Prisma Cloud Python Integration - PCPI, which contains our Python3 toolkit for Prisma Cloud APIs and returns a CSV formatted file. 


1.1 Python Best Practices: Create Virtual Environments


As part of Python's best practices, we create a virtual environment to install all our libraries. This allows us to keep installed libraries separate for each project we are working on. Within your command prompt, enter the command: python3 -m venv env


This command creates a virtual environment named “env”. We now need to enter that virtual environment. To do this, enter the command: source env/bin/activate 


1.2 Install Python Libraries


To install PCPI for the first time: 

$ python3 -m pip install pcpi

Collecting pcpi

  Downloading pcpi-0.1.0-py3-none-any.whl (15 kB)

Installing collected packages: pyyaml, charset-normalizer, certifi, idna, urllib3, requests, pcpi

Successfully installed certifi-2022.9.24 charset-normalizer-2.1.1 idna-3.4 pcpi-0.1.0 pyyaml-5.4.1 requests-2.28.1 urllib3-1.26.12


To update to the latest release of PCPI: 

$ python3 -m pip install pcpi —-upgrade


1.3 Obtain a Prisma Cloud Access Key


Before using the Prisma Cloud User Interface, you must log in with your userid and password.  Your organization may require you to use a single sign-on such as Okta or Azure AD. 


In order for your script to authenticate against the Prisma Cloud API, you will need to obtain an Access Key ID and a Secret Access Key from the Prisma Cloud console. You cannot use single sign-on with a script and the API.   


You will need to get authorization from your Prisma Cloud Administrator to have the ability to create an Access Key.  Please follow the best practices for security listed in: 

Scripting and Automation Best Practices with Prisma Cloud CSPM JWT  


Security best practices from your organization should be followed with password expiration of your access key.  Do not create Access Keys without expiry dates, as these are a security exposure. You are allowed two access keys per permission level at any time: active key and next active key.  Please ensure that access keys are rotated frequently.


2.0 Generate a Prisma Cloud CSV file from RQL


For performing an RQL query where you are looking for a CSV file identical to the one you can download from the Prisma Cloud User Interface, we have the library:  PaloAltoNetworks / pc-rql-search. This script contains a README that details how to install the script's dependencies and how to set up the configuration file. You can download the contents of the script from github by selecting the green ‘Code’ button and ‘Download ZIP’ entry’.



Figure 1_palo-alto-networks 

This library uses a configuration file, config,yml in the config directory:



  username: "01234567-0123-4567-89ab-9876543210"

  password: "my-secret-key-password"

  customer_name: "MyCompany"

  api_base: ""

  filename: "Xout.csv"

  rql: "config from cloud.resource where resource.status = Active and = 'aws-ec2-describe-instances' "

  utc: TRUE


The following variables are used:

username: the Access Key ID

password:  the Secret Key

customer_name:  a name used in the output file

api_base: specify the Prisma Cloud Stack that your account is on such as app2 or

filename:  name of the CSV output file

rql:  provide the RQL from the investigate page that you want to run and generate a CSV file

utc: TRUE provides the Date field in UTC .z time, FALSE gives time in Timezone format. The time zone will be based on the timezone on the machine running the script. 


Once you have filled out the configuration file, the script will be ready to run. This script is designed to give you results that mimic the CSV download from the Prisma Cloud user interface, while not having a limit to the number of records returned.

To run the script the first time, ensure that section 1.0 was followed to install PCPI. Open a command prompt and ensure you are in the same folder/directory that the script from section 2.0 is in. Create config.yml before your first run. Then run the following command: python3



Figure 2_palo-alto-networks

The script will collect the results of the RQL query and create a CSV file containing the default fields. You can add additional fields by using the ‘addcolumn’ function in RQL. You do not have to make changes to the script as it is capable of processing extra fields being returned.  


Once the script is done, a file with the name you specified in the configuration will be created. This CSV file contains the complete results of your RQL query. 


2.1 100,000 DB Record Limitation 


This script uses the paginated API, which queries up to 10 cloud accounts and returns the results from those cloud accounts.  The paginated API is designed to work with a large number of Database records spread across a large number of cloud accounts, as opposed to a small number of cloud accounts with a large number of records in each. 


Figure 2_palo-alto-networks


Note that there still is a limit of 100,000 records retrieved per 10 cloud accounts.  If you use a json.rule, this will filter from the maximum of 100,000 records retrieved from the 10 cloud accounts. This means you cannot retrieve 100,000+ records per 10 accounts and then return 100,000 records passing the json.rule through the API. 


If you have 35 cloud accounts, then the maximum number of Database records that can be retrieved is 35/10  * 100,000 =400,000 records.  The backend searches a random 10 accounts at a time, and each time you run the query, the random 10 accounts will be different

Cloud Accounts 1-10

Up to 100,000 results

Cloud Accounts 11-20

Up to 100,000 results

Cloud Accounts 21-30

Up to 100,000 results

Cloud Accounts 31-35

Up to 100,000 results


Total for Cloud Accounts 1-35

Up to 400,000 results

Figure 3_palo-alto-networks


2.2 Large Cloud Accounts


A large account is one in which a particular api you wish to query has >9500 values in it. 


Breaking a single query into two queries can help when you have several large accounts.  The first query would run against 3-6 large accounts, and the second query would run against the remaining accounts. 


This can be accomplished by using the “cloud.account IN (x,y,z)” followed by the second run with “cloud.account NOT IN (x,y,z)”. 


2.3 Fixing Errors


The file configs.yml has a bad format if you get:


  File "/Users/abc/Downloads/PCPI/pc-rql-search-main/lib/", line 7, in __init__

    config = self.read_yml('configs')



Ensure you have valid input data, the quote marks are correct and the last line in configs.yml should be:

  utc: TRUE

3.0 Python Script to return JSON data


There are times where the limited data fields of the above CSV file are not sufficient for what is required of the data.  Cloud objects can contain a lot of nested data, and this is best presented using JSON. 


For a use case that requires the complete API JSON data, this section presents another Prisma Cloud script that will return the raw results for each object in JSON format.  A user program can then examine the JSON data to find the exact information about the resulting cloud objects. .  


An example use case would be to correlate the Prisma Cloud data with an external data source and enhance the data with additional fields that your organization could use to determine ownership of the actions on the data.


As an example, you may have an internal database that contains project and owner information, and you need to retrieve Prisma Cloud data filtered or sorted by that external data source.


Please note that the 100,000 record limit described in 2.1 above applies in this script as well.  In this script, you have the capability to run 2 or more queries and collect all the results in one run should you choose to handle some really large accounts. 


3.1 Conventions Used in Python Script


Comments are lines starting with “#”

#This is a comment to tell us what the script is doing here


This prints the raw json returned by the API.  It is not very readable in most cases, but is very compact:



This will print all the json return by the API, but will format the print, and indent the json:

print(json.dumps(res.json(), indent=4))


This will print only the first json object returned by the API, formatting and indenting the json:

print(json.dumps(res.json()[0], indent=4))


For purposes of the demo script, we are writing the JSON data to a file that can be viewed at a later date and will persist after you close your terminal. This is done at the end of the script where we open a file, then use the json.dump() function to output our JSON on data to a file.


3.2 Sample Script which runs RQL and returns a file containing JSON results


Using your favorite code editor, such as VS Code, create a file named (the “.py” extension is what makes this a script, “.py.docx” is not an executable script) with the following Python content:


from pcpi import session_loader

import json

#Author: Adam Hamilton-Sutherland - Palo Alto Networks

#Title: Prisma Cloud RQL Search

#Description: Runs a Prisma Cloud RQL search via API calls

# and collects all results through pagination. 

#When the script is done, results are saved to a json file.

#The contents of the script are stored in memeory before being written to a

# file. If your RQL query is exceptionally large, your machine may run out of

# memeory and the script could crash before the output file is created.

# You can also run out of disk space for the output file.



session_manager = session_loader.load_config('prisma_creds.json')[0]

session = session_manager.create_cspm_session()


#Create API Payload for the config search endpoint

# 2000 objects per API call is considered the best practice for this REST API.

#There is no limit to how many total results this API will return. 

#Eg, in a large Tenant, you may receive millions of results.

limit = 2000

rql = "config from cloud.resource where resource.status = Active \ AND = 'aws-ec2-describe-instances'" 

payload = {

    "query": rql,

    "limit": limit,

    "timeRange": {

        "relativeTimeType": "BACKWARD",

        "type": "relative",

        "value": {

            "amount": 24,

            "unit": "hour"



    "withResourceJson": True,

    "heuristicSearch": True



#Make initial API request to the search endpoint to kick off the RQL

#  query

response = session.request('POST', f'/search/config', json=payload)


#To get all the results from the RQL query, you must paginate 

# through the results.

# A few variables are needed for the pagination logic and for 

# processing the results 

total_rows = 0

complete_res_list = []

complete_res_dict = response.json()


#res_data var used for while loop

res_data = response.json()['data']


#Add the first API call's response to the complete list of responses



#Update variables used for pagination

retrievedRows = res_data.get('totalRows')

total_rows += retrievedRows


counter = 0

while 'nextPageToken' in res_data:

    #update payload

    payload.update({'pageToken': res_data.get('nextPageToken')})


    #call page endpoint

    response = session.request('POST', 



    counter += 1


    #update res_data with the paginated response

    res_data = response.json()


    #Add results from each page API call



    retrievedRows = res_data.get('totalRows')

    total_rows+= retrievedRows


#Update response dict to be the some format as the 

# 'data' object in a typical RQL res


    {'totalRows':total_rows, 'items': complete_res_list}



with open('rql_json_response.json', 'w') as outfile:

    json.dump(complete_res_dict,outfile, indent=4)


3.3 Running the Python Script


When the script is run for the first time, answer the following prompts:


  1. Prisma Cloud URL 
  2. your Access Key
  3. your Secret key 
  4. name of your Prisma Cloud tenant 
  5. enable/disable certificate verification. 


The last two questions are both optional and can be left blank. Simply hit the enter key to skip the question.

If you have closed your command prompt after installing PCPI, you may need to re-enter your virtual environment. The command is: source env/bin/activate

Open up your terminal and type the following command:


$ python3 

No credentials file found. Generating...

Enter credentials for the tenant

Enter tenant name or any preferred identifier:


Enter tenant url. (ex:😞


Adjusted URL:


Enter tenant access key:


Enter tenant secret key:



API - Validating credentials




After setup, the script will make API calls to the Prisma Cloud RQL Search endpoint. A number of APIs calls will be made as this endpoint is paginated, as per current Rest API best practices for return payload size and response time.  Having a paginated API allows the query to return a very large number of results, much larger than is supported in the Prisma Cloud UI.  (see RESTful web API design)


DEBUG:pcpi:API - Generating CSPM session token.


INFO:pcpi:SUCCESS - 0.779 seconds


INFO:pcpi:SUCCESS - 1.443 seconds


INFO:pcpi:SUCCESS - 0.457 seconds


INFO:pcpi:SUCCESS - 0.473 seconds


INFO:pcpi:SUCCESS - 0.367 seconds


INFO:pcpi:SUCCESS - 0.357 seconds



At this point, the json results of your RQL are stored in the file: rql_json_response.json. This file contains all data that was returned from the Prisma Cloud API. 


As the entire script is included, users may decide to process the data returned in the script and change which data is included in the output. 


4.0 Conclusion


You now have experience using the Python programming language to consume Prisma Cloud APIs and to process the resulting data. Literacy in scripting languages such as Python unlocks a whole new set of capabilities for automating workflows and generating customized reports for your most important use cases.




  1. Prisma Cloud API documentation
  2. Prisma Cloud Python Integration - PCPI
  3. Scripting and Automation Best Practices with Prisma Cloud CSPM JWT 


About the Authors


Paul Burega and Adam Hamilton-Sutherland are cloud security engineers specializing in Cloud Security Posture Management. Paul and Adam utilize collaborative approaches to break down complex problems into solutions for global enterprise customers and leverage their multi-industry knowledge to inspire success. 


Rate this article:
Register or Sign-in
Article Dashboard
Version history
Last Updated:
‎12-26-2023 10:09 AM
Updated by: