XSOAR with Generative AI and Retrieval Augmented Generation

RPrasadi · ‎02-05-2025

Randy Uhrlaub, Cortex XSOAR Customer Success Architect

Table Of Content

Introduction
Retrieval Augmented Generation
Text Search Augmented Generation
Anything LLM XSOAR Content Pack
Customer InfrastructureHosted
Cloud Hosted
Customer InfrastructureSetup
Anything LLM and XSOARIntegration Instance Configuration
Use Case Development
Workspace and Document Management
Workspaces
Workspace Embeddings
Documents
AI Playground
General Tips and Guidance
Example Scripts Using the Anything LLM Integration
Summarize an Investigation
XSOAR Natural Language Command Interface
XSOAR Natural Language Search
References

Introduction

Use of Generative AI (GenAI) and Retrieval Augmented Generation (RAG) with XSOAR is provided by the Anything LLM marketplace content pack. Anything LLM can be cloud-based or to address privacy, compliance, and cost requirements; it can be installed on customer infrastructure. A large selection of LLM models and vector databases are available in Anything LLM and custom LLM models can be imported.

Use of GenAI in RAG with XSOAR allows a customer to incorporate their data:

Private, customer data not used in training a commercial LLM model.
Public data published after the LLM model training date.
Dynamic data from XSOAR (incident, indicator, investigation, content, and documentation).
Avoids expensive re-training and fine tuning of an LLM model.
RAG and context centered conversations are more accurate and not prone to hallucinations.

Below are some examples of uses cases in XSOAR where GenAI and RAG can facilitate:

XSOAR Integration Help
XSOAR Script Help
XSOAR Help
XSOAR Natural Language Command Interface
XSOAR Natural Language Search
XSOAR Investigation Summaries
Policy and Procedure Guidance
Threat Intel Blog Summaries
Security Advisory Summaries
CVE summaries

Retrieval Augmented Generation

Retrieval Augment Generation incorporates external data and improves reliability of the generated responses and reduces or eliminates hallucinations by the LLM. External data is encoded and embedded as real number vectors in a vector database. An LLM prompt is first sent to the vector database and any similar results returned and added to the prompt’s conversation context before being sent to the LLM. This conversation context provides the primary information for the LLM to generate a response, versus relying on its training that may return statistically probable results which are not consistently accurate.

In Anything LLM, the default RAG configuration with the LanceDB vector database uses the following approach when embedding documents:

Documents are split into up to 1,000 character chunks.
Each chunk is converted into an array of 384 real numbers.
Real number vectors for the document are added to the vector database.
Similarity search returns the top N (4 -12) vectors based on cosine similarity of the embedded prompt to vectors in the database.
Returned chunks of text are added to the prompt and conversation’s context and sent to the LLM.
The LLM returns the response to the prompt.

Text Search Augmented Generation

While vector database similarity search provides a few, most similar results to the prompt, text search augments the prompt when a broader set of information is needed exceeding the results from similarity search of a vector database. For example, adding a list of Mitre ATT&CK Tactics and Techniques or search results of XSOAR incidents and indicators into the conversation for a response from the LLM.

Anything LLM XSOAR Content Pack

The Anything LLM content pack contains an integration, fields, incident type, layout, and scripts. The AI Playground incident type and layout provides an environment to conduct prompt and data engineering for XSOAR use case implementation.

External documents, search results from XSOAR, and web links can be uploaded and embedded into a vector database. Similar results to the query from the vector database and direct injection of search results into the conversation context are combined with the query and sent to the LLM model to generate a response.

Figure 01: XSOARandAnythingLLM_PaloAltoNetworks

Customer Infrastructure Hosted

Anything LLM can be installed on customer infrastructure and supports a range of LLM models and locally installed vector databases. Example LLM models:

Llama3
Codellama
Mistral
Gemma
Orca
Phi

Example vector databases:

LanceDB
Chroma
Milvus

Cloud Hosted

Anything LLM is also available as a cloud service and configured to use cloud-based LLM models and vector databases when data privacy is not a requirement. Examples of cloud-based LLM providers:

OpenAI
Google Gemini
Anthropic
Cohere
Hugging Face
Perplexity

Examples of cloud-based vector database providers:

Pinecode
QDrant
Weaviate

Customer Infrastructure Setup

If a customer infrastructure installation of Anything LLM is preferred over use of the Anything LLM cloud service, install Anything LLM on a host running:

Linux
Windows
Mac

For testing and development, a typical desktop or server system with 16GB RAM is able to run the Llama 3.2 11 billion parameter and similar small models in 12-14 GB RAM but is relatively slow at answering questions using only a CPU.

For production use, a GPU-based system with sufficient VRAM for your LLM model is recommended. For example, the Llama 3.3 70 billion parameter model requires a GPU with 32 - 64 GB of VRAM. Below examples of cloud instances and server hardware that support 70 billion and larger models:

Google GCP: A2 with NVIDIA A100 GPU with 40 or 80 GB VRAM
Amazon AWS: P4d or P4de NVIDIA A100 GPU with 40 or 80 GB VRAM
NVIDIA Project DIGITS: Small desktop with GraceBlackwell CPU/GPU with 128 GB Unified RAM/VRAM (supports up to 200 billion parameter models, ~$3,000, available Q2 2024)

Anything LLM and XSOAR Integration Instance Configuration

Once Anything LLM is available, create an instance of the AnythingLLM integration:

In Anything LLM:
- Generate an API key for the XSOAR integration (Developer API menu option).
- Activate selected LLM model (LLM menu option).
- Activate selected vector database (Vector Database menu option).

Figure 02 AnythingLLMapiKey_PaloAltoNetworks

Figure 03 AnythingLLMproviders_PaloAltoNetworks

Below is an example of Ollama installed on customer infrastructure and LLM models downloaded by Ollama. The context window is configured to 32,768 tokens (~64K characters +|-) for the llama3.2-vision 11 billion parameter model which provides an 8K - 128K token context window size. 8K or 128K are the context sizes available when using the AnythingLLM LLM provider and a llama model provided there. Using the Ollama LLM provider allows control of the context window size. 8K is typically too small for XSOAR related data.

Figure 04 AnythingLLMwithOllama_PaloAltoNetworks

In XSOAR:
- Configure the XSOAR integration instance with the Anything LLM url and api key.
- If required, Cloudflare access can also be configured.

Figure 05 AnythingLLMconfig_PaloAltoNetworks

Use Case Development

The Anything LLM content pack provides an interactive environment for data and prompt engineering for developing the steps needed to automate a use case. Create a new incident of type AI Playground. The layout provides two tabs: Workspace and Document Management and AI Playground for uploading and embedding documents into a workspace and developing the needed prompts and workspace settings (Mode, Temperature, Similarity, and TopN). Some use cases may just require RAG where a few, similar pieces of text are retrieved from embedded documents while other use cases may require additional text to be added to the context of an LLM conversation using text search capabilities. A sequence of prompts may be required to generate the final response.

The general use case development process is:

Create an incident with a type of AI Playground.
Create a workspace and configure its settings (the anyllm-workspace-new command creates a workspace).
Upload and embed needed documents in the workspace.
Develop and test prompts with augmented data as required.
similarity search of embedded documents.
text search of documents or data in XSOAR.
Once the steps and desired results are achieved.
build the playbook and any scripts needed to automate the use case.

For the most accurate results, query mode is recommended for most chats. This preloads the conversation context based on the initial query with similar results from documents embedded in a workspace. In a large document, query mode may not ensure a complete answer depending on the number of times the query topic is mentioned in the embedded documents and limits on the number of returned similar results and text search data is included in the conversation. When all data needed for the response it is injected into the conversation and not dependent on embedded documents, chat mode is used.

Text splitting and chunking can be adjusted from the defaults to better support a specific use case. Adjusting the similarityThreshold and topN settings in a workspace are often beneficial to optimize the workspace for an use case.

Workspace and Document Management

Workspaces

The Workspace and Document Management tab of the incident layout enables management of workspaces and documents. The Workspaces section lists the available workspaces and allows configuration of their settings and selecting the current workspace by editing the table and using the Action option Current to set the active workspace.

Workspace Embeddings

For the current workspace, the list of embedded documents are displayed in the Workspace Embeddings section. The Action options there are to Remove the embedded document, Pin the embedded document to the workspace adding all the content to the conversation context, and Unpin the embedded document from the conversation. Care must be taken to not consume all the context space by pinning a large document.

Similarity search of an embedded document returns only the top chunks based on distance (cosine similarity) from the embedded form of the query - like the distance between two points in 3 dimensional space. In query mode when using the anyllm-workspace-thread-chat command, if 0 results are returned from the similarity search of the vector database, the query is aborted with “no relevant documentation” message. You may still get incorrect results if similarity search returns results and there is related information the model was trained on. Prompt and data engineering addresses this as well as adding text search results to the conversation context to supplement. In chat mode, the prompt may include results from a similarity search but does not require it.

Documents

The Documents section displays all the documents uploaded into Anything LLM available for embedding in a workspace. The available Action options are Embed to embed the document into the current workspace, or Delete to delete the document from the catalog.

Documents with a Title prefixed by an XSOAR war room file entry ID were made text searchable by first uploading them to the war room and then use of the Process War Room Text File Entry for Upload button to preprocess the document, followed by the Upload Processed Information as LLM Document button.

XSOAR search results from the AI Playground can be processed and uploaded using the Process Search Results for Upload button followed by the Upload Processed Information as LLM Document button. External text can also be uploaded as a searchable LLM document using the same process with the Process Text for Upload button. In version 2.0 of the content pack, the Process Web Link for Upload button uploads the document to the catalog for embedding, but not as a searchable document.

Documents specific to an investigation are added to the investigation's war room and uploaded to the LLM. Documents that apply to multiple investigations, to retain their searchability, a dedicated incident is created and the documents uploaded to that war room. These incidents should be flagged for long term retention since their XSOAR file entry ID is associated with the investigation IDs and stored in Anything LLM as the document’s title. As an example, Mitre ATT&CK documentation is uploaded to an incident dedicated to retaining them as searchable and embeddable documents across many investigations.

Figure 06 WorkspaceAndDocumentManagement_PaloAltoNetworks

AI Playground

The AI Playground tab is used to develop prompts against a workspace and its embedded documents with additional text from LLM documents or XSOAR using the Text Search... buttons. Useful search results are added to the conversation context with the Add Search Results to Conversation button. A valuable conversation is saved to the war room with the Save Conversation to the War Room button.

Figure 07 AIplayground_PaloAltoNetworks

General Tips and Guidance

Clean uploaded documentation from extraneous text (ie: HTML and PDF formatting and page footers/headers etc.) when embedding a document since data is returned in 1,000 character chunks to ensure data being searched for is retrieved. Extraneous text may cause chunks to be returned that do not contain the data needed.
In a workspace, only embed the documents needed for the use case. It may be advantageous to create a workspace for an investigation, dynamically embed needed documents, then delete the workspace at incident closure.
Depending on the LLM model used, asking three precise questions about A, then B, then C, may give better results than one question about A and B and C. Once the three questions are asked and results in the conversation context, asking the final question with all three intermediate results may be more effective.
Once a partial result is achieved and the full conversation context is no longer needed for subsequent questions, start a new conversation thread with no context. Keeping the context small and focused increases speed and accuracy of responses.
An incorrect response in the conversation context pollutes subsequent results. Testing and tuning your approach prevents this.
Setting the workspace Temperature to the lowest value supported by your LLM model provides the most deterministic results.
If similarity search is not returning the correct results, review the number of chunks being returned. If too few chunks, increase the Top N setting or reduce the Similarity setting. If too many chunks are returned without the correct data, increase the Similarity setting. This is where clean documented supports providing the correct results.
Be aware of the context window size of your LLM model and how it relates to the data you are adding to the conversation context either via similarity search, text search, or pinning an embedded document to a workspace. Filling the context window causes the prompt to fail or rolls data off from the beginning of the conversation.
Context windows are usually specified in tokens and each token may be a character or a part of a word, or a word in size. An 8K context window supports approximately 16K characters, which varies depending on the tokens used.
Large context windows increases memory requirements and the time it takes to answer a question and may reduce accuracy when large amount of data are in the conversation context.
When searching structured text like YAML or JSON where you need only a small set of lines, a regex pattern such as (?s)\d(?<=[\d\[\].])(.*?:TLS) helps minimize text added to the conversation context. This is a simple pattern example to: find all the lines starting with either a defanged IP or domain and finishing with a line containing ":TLS".

Example Scripts Using the Anything LLM Integration

Summarize an Investigation

Below is an example script to summarize an investigation by looking at the sequential order of completed tasks and summarizing each task executed. It uses chat mode since all the data is being provided dynamically from XSOAR versus data from an embedded document. In addition to the investigation id and workspace name arguments, the following question is passed as an argument:

Summarize the task in the following JSON. Please include name, start, and completed dates, description and script and script arguments for each task. If it is a condition task, only tell me what branch it took.

Figure 08 SummarizeInvestigation_PaloAltoNetworks

import collections

import uuid

def main():

try:

incid = demisto.args().get("id", "")

workspace = demisto.args().get("workspace", "")

question = demisto.args().get("question", "")

if incid == "" or workspace == "" or question == "":

return

resp = execute_command("core-api-get", {

"uri": f"/inv-playbook/{incid}"

})

tasks = {}

for k, t in resp['response']['tasks'].items():

if t['type'].lower() in ["regular", "condition", "playbook"]

and t['state'].lower() == "completed":

tasks[t['completedDate']] = t

sortedtasks = collections.OrderedDict(sorted(tasks.items()))

results = ""

for k, v in sortedtasks.items():

thread_uuid = str(uuid.uuid4())

execute_command("anyllm-workspace-thread-new", {

'workspace': workspace,

'thread': thread_uuid

})

prompt = f"{question}: {json.dumps(v)}"

results += f"\n{execute_command('anyllm-workspace-thread-chat', {

'message': prompt,

'mode': 'chat',

'workspace': workspace,

'thread': thread_uuid

})['textResponse']}\n"

execute_command("anyllm-workspace-thread-delete", {

'workspace': workspace,

'thread': thread_uuid

})

return_results(CommandResults(readable_output=results))

except Exception as ex:

if thread_uuid != "":

execute_command("anyllm-workspace-thread-delete", {

'workspace': workspace,

'thread': thread_uuid

})

demisto.error(traceback.format_exc())

return_error(f'Failed to execute SummarizeInvestigation. Error: {ex}')

if __name__ in ('__main__', '__builtin__', 'builtins'):

main()

XSOAR Natural Language Command Interface

This script uses a sequence of prompts to the LLM and an embedded implementation of the Anything XSOAR integration code. The first two prompts use query mode to retrieve the appropriate command name and then a JSON template of the command parameters. The final chat mode prompt uses the parameters template and the parameters argument to populate a python dictionary that is passed to execute_command().

Figure 09 NaturalLanguageCommandInterface_PaloAltoNetworks

import collections

import uuid

def main():

try:

workspace = demisto.args().get("workspace", "")

cmddesc = demisto.args().get("command", "")

argdesc = demisto.args().get("parameters", "")

if workspace == "" or cmddesc == "":

return

thread_uuid = str(uuid.uuid4())

execute_command("anyllm-workspace-thread-new", {

'workspace': workspace,

'thread': thread_uuid

})

command = execute_command('anyllm-workspace-thread-chat', {

'message': f"{cmddesc}. Return only the command name",

'mode': 'query',

'workspace': workspace,

'thread': thread_uuid

})['textResponse']

arguments = execute_command('anyllm-workspace-thread-chat', {

'message': f"How would I invoke {command} with the python function execute_command(command, parameters)? Return only the the python dictionary for the parameters",

'mode': 'query',

'workspace': workspace,

'thread': thread_uuid

})['textResponse']

argjson = execute_command('anyllm-workspace-thread-chat', {

'message': f"Use the following data: \"{argdesc}\" as values to create a python dictionary using these keys: {arguments}. Return only the python dictionary with the new key values set as a JSON string",

'mode': 'chat',

'workspace': workspace,

'thread': thread_uuid

})['textResponse']

argsdict = json.loads(argjson)

argsdict['workspace'] = workspace

argsdict['thread'] = thread_uuid

results = execute_command(command, argsdict)['textResponse']

execute_command("anyllm-workspace-thread-delete", {

'workspace': workspace,

'thread': thread_uuid

})

return_results(CommandResults(readable_output=results))

except Exception as ex:

demisto.error(traceback.format_exc())

if thread_uuid != "":

execute_command("anyllm-workspace-thread-delete", {

'workspace': workspace, 'thread': thread_uuid})

return_error(f'Failed to execute IntegrationNLC. Error: {ex}')

if __name__ in ('__main__', '__builtin__', 'builtins'):

main()

XSOAR Natural Language Search

This script illustrates how to perform natural language search of XSOAR incidents. It uses chat mode LLM queries since it does not require RAG; the parameters to getIncidents are included in the script as a python dictionary.

Figure 10 NaturalLanguageSearch_PaloAltoNetworks

import collections

import uuid

searchArgs = {

'page': "Filter by the page number",

'size': "Filter by the page size (per fetch)",

'sort': "",

'id': "Filter by the incident IDs",

'name': "Filter by incident names",

'status': "Filter by the status. Pending (0), Active (1), Done (2), Archive (3)",

'notstatus': "Negate status (e.g. get only incidents that do not have the status of active)",

'reason': "Filter by closure reason",

'fromdate': "Filter by from date (e.g. 2006-01-02T15:04:05+07:00)",

'todate': "Filter by to date (e.g. 2016-01-02T15:04:05+07:00)",

'fromclosedate': "Filter by the incident close date, from",

'toclosedate': "Filter by the incident to close date, to",

'fromduedate': "Filter by SLA due date, from",

'toduedate': "Filter by SLA due date, to",

'level': "Filter by Severity. Unknown (0), Informational (0.5), Low (1), Medium (2), High (3), Critical (4)",

'investigation': "",

'owner': "Filter by incident owners",

'details': "Filter by incident details",

'type': "Filter by incident type",

'query': "Use free form query (use Lucene syntax) as filter. All other filters will be ignored when this filter is used",

'searchInNotIndexed': "Also search for incidents that have not yet been indexed",

'populateFields': "A comma-separated list of fields and custom fields in the object to populate"

}

def main():

try:

workspace = demisto.args().get("workspace", "")

argdesc = demisto.args().get("parameters", "")

if workspace == "" or argdesc == "":

return

thread_uuid = str(uuid.uuid4())

execute_command("anyllm-workspace-thread-new", {

'workspace': workspace,

'thread': thread_uuid

})

execute_command('anyllm-workspace-thread-chat', {

'message': f"Here are the \"getIncidents\" command's arguments as a python dictionary in JSON: {json.dumps(searchArgs)}. Use this to create a python dictionary for the arguments to \"getIncidents\" ",

'mode': 'chat',

'workspace': workspace,

'thread': thread_uuid

})

argjson = execute_command('anyllm-workspace-thread-chat', {

'message': f"Use the following data: \"{argdesc}\" as values to create the python dictionary for the \"getIncidents\" command's arguments. Leave out any missing values. Return only the arguments python dictionary as a JSON string",

'mode': 'chat',

'workspace': workspace,

'thread': thread_uuid

})['textResponse']

argsdict = json.loads(argjson)

results = execute_command("getIncidents", argsdict)

execute_command("anyllm-workspace-thread-delete", {

'workspace': workspace,

'thread': thread_uuid

})

return_results(CommandResults(readable_output=results))

except Exception as ex:

demisto.error(traceback.format_exc())

if thread_uuid != "":

execute_command("anyllm-workspace-thread-delete", {

'workspace': workspace,

'thread': thread_uuid

})

return_error(f'Failed to execute SearchIncidentsNLC. Error: {ex}')

if __name__ in ('__main__', '__builtin__', 'builtins'):

main()

References

RayyanAlboqari · ‎04-20-2025

This is very informative.

Thank you for sharing.

Unlock your full community experience!

XSOAR with Generative AI and Retrieval Augmented Generation