- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
02-05-2025 03:57 PM - edited 02-06-2025 10:16 AM
Randy Uhrlaub, Cortex XSOAR Customer Success Architect
Table Of Content
Use of Generative AI (GenAI) and Retrieval Augmented Generation (RAG) with XSOAR is provided by the Anything LLM marketplace content pack. Anything LLM can be cloud-based or to address privacy, compliance, and cost requirements; it can be installed on customer infrastructure. A large selection of LLM models and vector databases are available in Anything LLM and custom LLM models can be imported.
Use of GenAI in RAG with XSOAR allows a customer to incorporate their data:
Below are some examples of uses cases in XSOAR where GenAI and RAG can facilitate:
Retrieval Augment Generation incorporates external data and improves reliability of the generated responses and reduces or eliminates hallucinations by the LLM. External data is encoded and embedded as real number vectors in a vector database. An LLM prompt is first sent to the vector database and any similar results returned and added to the prompt’s conversation context before being sent to the LLM. This conversation context provides the primary information for the LLM to generate a response, versus relying on its training that may return statistically probable results which are not consistently accurate.
In Anything LLM, the default RAG configuration with the LanceDB vector database uses the following approach when embedding documents:
While vector database similarity search provides a few, most similar results to the prompt, text search augments the prompt when a broader set of information is needed exceeding the results from similarity search of a vector database. For example, adding a list of Mitre ATT&CK Tactics and Techniques or search results of XSOAR incidents and indicators into the conversation for a response from the LLM.
The Anything LLM content pack contains an integration, fields, incident type, layout, and scripts. The AI Playground incident type and layout provides an environment to conduct prompt and data engineering for XSOAR use case implementation.
External documents, search results from XSOAR, and web links can be uploaded and embedded into a vector database. Similar results to the query from the vector database and direct injection of search results into the conversation context are combined with the query and sent to the LLM model to generate a response.
Figure 01: XSOARandAnythingLLM_PaloAltoNetworks
Anything LLM can be installed on customer infrastructure and supports a range of LLM models and locally installed vector databases. Example LLM models:
Example vector databases:
Anything LLM is also available as a cloud service and configured to use cloud-based LLM models and vector databases when data privacy is not a requirement. Examples of cloud-based LLM providers:
Examples of cloud-based vector database providers:
If a customer infrastructure installation of Anything LLM is preferred over use of the Anything LLM cloud service, install Anything LLM on a host running:
For testing and development, a typical desktop or server system with 16GB RAM is able to run the Llama 3.2 11 billion parameter and similar small models in 12-14 GB RAM but is relatively slow at answering questions using only a CPU.
For production use, a GPU-based system with sufficient VRAM for your LLM model is recommended. For example, the Llama 3.3 70 billion parameter model requires a GPU with 32 - 64 GB of VRAM. Below examples of cloud instances and server hardware that support 70 billion and larger models:
Once Anything LLM is available, create an instance of the AnythingLLM integration:
Figure 02 AnythingLLMapiKey_PaloAltoNetworks
Figure 03 AnythingLLMproviders_PaloAltoNetworks
Below is an example of Ollama installed on customer infrastructure and LLM models downloaded by Ollama. The context window is configured to 32,768 tokens (~64K characters +|-) for the llama3.2-vision 11 billion parameter model which provides an 8K - 128K token context window size. 8K or 128K are the context sizes available when using the AnythingLLM LLM provider and a llama model provided there. Using the Ollama LLM provider allows control of the context window size. 8K is typically too small for XSOAR related data.
Figure 04 AnythingLLMwithOllama_PaloAltoNetworks
Figure 05 AnythingLLMconfig_PaloAltoNetworks
The Anything LLM content pack provides an interactive environment for data and prompt engineering for developing the steps needed to automate a use case. Create a new incident of type AI Playground. The layout provides two tabs: Workspace and Document Management and AI Playground for uploading and embedding documents into a workspace and developing the needed prompts and workspace settings (Mode, Temperature, Similarity, and TopN). Some use cases may just require RAG where a few, similar pieces of text are retrieved from embedded documents while other use cases may require additional text to be added to the context of an LLM conversation using text search capabilities. A sequence of prompts may be required to generate the final response.
The general use case development process is:
For the most accurate results, query mode is recommended for most chats. This preloads the conversation context based on the initial query with similar results from documents embedded in a workspace. In a large document, query mode may not ensure a complete answer depending on the number of times the query topic is mentioned in the embedded documents and limits on the number of returned similar results and text search data is included in the conversation. When all data needed for the response it is injected into the conversation and not dependent on embedded documents, chat mode is used.
Text splitting and chunking can be adjusted from the defaults to better support a specific use case. Adjusting the similarityThreshold and topN settings in a workspace are often beneficial to optimize the workspace for an use case.
The Workspace and Document Management tab of the incident layout enables management of workspaces and documents. The Workspaces section lists the available workspaces and allows configuration of their settings and selecting the current workspace by editing the table and using the Action option Current to set the active workspace.
For the current workspace, the list of embedded documents are displayed in the Workspace Embeddings section. The Action options there are to Remove the embedded document, Pin the embedded document to the workspace adding all the content to the conversation context, and Unpin the embedded document from the conversation. Care must be taken to not consume all the context space by pinning a large document.
Similarity search of an embedded document returns only the top chunks based on distance (cosine similarity) from the embedded form of the query - like the distance between two points in 3 dimensional space. In query mode when using the anyllm-workspace-thread-chat command, if 0 results are returned from the similarity search of the vector database, the query is aborted with “no relevant documentation” message. You may still get incorrect results if similarity search returns results and there is related information the model was trained on. Prompt and data engineering addresses this as well as adding text search results to the conversation context to supplement. In chat mode, the prompt may include results from a similarity search but does not require it.
The Documents section displays all the documents uploaded into Anything LLM available for embedding in a workspace. The available Action options are Embed to embed the document into the current workspace, or Delete to delete the document from the catalog.
Documents with a Title prefixed by an XSOAR war room file entry ID were made text searchable by first uploading them to the war room and then use of the Process War Room Text File Entry for Upload button to preprocess the document, followed by the Upload Processed Information as LLM Document button.
XSOAR search results from the AI Playground can be processed and uploaded using the Process Search Results for Upload button followed by the Upload Processed Information as LLM Document button. External text can also be uploaded as a searchable LLM document using the same process with the Process Text for Upload button. In version 2.0 of the content pack, the Process Web Link for Upload button uploads the document to the catalog for embedding, but not as a searchable document.
Documents specific to an investigation are added to the investigation's war room and uploaded to the LLM. Documents that apply to multiple investigations, to retain their searchability, a dedicated incident is created and the documents uploaded to that war room. These incidents should be flagged for long term retention since their XSOAR file entry ID is associated with the investigation IDs and stored in Anything LLM as the document’s title. As an example, Mitre ATT&CK documentation is uploaded to an incident dedicated to retaining them as searchable and embeddable documents across many investigations.
Figure 06 WorkspaceAndDocumentManagement_PaloAltoNetworks
The AI Playground tab is used to develop prompts against a workspace and its embedded documents with additional text from LLM documents or XSOAR using the Text Search... buttons. Useful search results are added to the conversation context with the Add Search Results to Conversation button. A valuable conversation is saved to the war room with the Save Conversation to the War Room button.
Figure 07 AIplayground_PaloAltoNetworks
Below is an example script to summarize an investigation by looking at the sequential order of completed tasks and summarizing each task executed. It uses chat mode since all the data is being provided dynamically from XSOAR versus data from an embedded document. In addition to the investigation id and workspace name arguments, the following question is passed as an argument:
Summarize the task in the following JSON. Please include name, start, and completed dates, description and script and script arguments for each task. If it is a condition task, only tell me what branch it took.
Figure 08 SummarizeInvestigation_PaloAltoNetworks
import collections
import uuid
def main():
try:
incid = demisto.args().get("id", "")
workspace = demisto.args().get("workspace", "")
question = demisto.args().get("question", "")
if incid == "" or workspace == "" or question == "":
return
resp = execute_command("core-api-get", {
"uri": f"/inv-playbook/{incid}"
})
tasks = {}
for k, t in resp['response']['tasks'].items():
if t['type'].lower() in ["regular", "condition", "playbook"]
and t['state'].lower() == "completed":
tasks[t['completedDate']] = t
sortedtasks = collections.OrderedDict(sorted(tasks.items()))
results = ""
for k, v in sortedtasks.items():
thread_uuid = str(uuid.uuid4())
execute_command("anyllm-workspace-thread-new", {
'workspace': workspace,
'thread': thread_uuid
})
prompt = f"{question}: {json.dumps(v)}"
results += f"\n{execute_command('anyllm-workspace-thread-chat', {
'message': prompt,
'mode': 'chat',
'workspace': workspace,
'thread': thread_uuid
})['textResponse']}\n"
execute_command("anyllm-workspace-thread-delete", {
'workspace': workspace,
'thread': thread_uuid
})
return_results(CommandResults(readable_output=results))
except Exception as ex:
if thread_uuid != "":
execute_command("anyllm-workspace-thread-delete", {
'workspace': workspace,
'thread': thread_uuid
})
demisto.error(traceback.format_exc())
return_error(f'Failed to execute SummarizeInvestigation. Error: {ex}')
if __name__ in ('__main__', '__builtin__', 'builtins'):
main()
This script uses a sequence of prompts to the LLM and an embedded implementation of the Anything XSOAR integration code. The first two prompts use query mode to retrieve the appropriate command name and then a JSON template of the command parameters. The final chat mode prompt uses the parameters template and the parameters argument to populate a python dictionary that is passed to execute_command().
Figure 09 NaturalLanguageCommandInterface_PaloAltoNetworks
import collections
import uuid
def main():
try:
workspace = demisto.args().get("workspace", "")
cmddesc = demisto.args().get("command", "")
argdesc = demisto.args().get("parameters", "")
if workspace == "" or cmddesc == "":
return
thread_uuid = str(uuid.uuid4())
execute_command("anyllm-workspace-thread-new", {
'workspace': workspace,
'thread': thread_uuid
})
command = execute_command('anyllm-workspace-thread-chat', {
'message': f"{cmddesc}. Return only the command name",
'mode': 'query',
'workspace': workspace,
'thread': thread_uuid
})['textResponse']
arguments = execute_command('anyllm-workspace-thread-chat', {
'message': f"How would I invoke {command} with the python function execute_command(command, parameters)? Return only the the python dictionary for the parameters",
'mode': 'query',
'workspace': workspace,
'thread': thread_uuid
})['textResponse']
argjson = execute_command('anyllm-workspace-thread-chat', {
'message': f"Use the following data: \"{argdesc}\" as values to create a python dictionary using these keys: {arguments}. Return only the python dictionary with the new key values set as a JSON string",
'mode': 'chat',
'workspace': workspace,
'thread': thread_uuid
})['textResponse']
argsdict = json.loads(argjson)
argsdict['workspace'] = workspace
argsdict['thread'] = thread_uuid
results = execute_command(command, argsdict)['textResponse']
execute_command("anyllm-workspace-thread-delete", {
'workspace': workspace,
'thread': thread_uuid
})
return_results(CommandResults(readable_output=results))
except Exception as ex:
demisto.error(traceback.format_exc())
if thread_uuid != "":
execute_command("anyllm-workspace-thread-delete", {
'workspace': workspace, 'thread': thread_uuid})
return_error(f'Failed to execute IntegrationNLC. Error: {ex}')
if __name__ in ('__main__', '__builtin__', 'builtins'):
main()
This script illustrates how to perform natural language search of XSOAR incidents. It uses chat mode LLM queries since it does not require RAG; the parameters to getIncidents are included in the script as a python dictionary.
Figure 10 NaturalLanguageSearch_PaloAltoNetworks
import collections
import uuid
searchArgs = {
'page': "Filter by the page number",
'size': "Filter by the page size (per fetch)",
'sort': "",
'id': "Filter by the incident IDs",
'name': "Filter by incident names",
'status': "Filter by the status. Pending (0), Active (1), Done (2), Archive (3)",
'notstatus': "Negate status (e.g. get only incidents that do not have the status of active)",
'reason': "Filter by closure reason",
'fromdate': "Filter by from date (e.g. 2006-01-02T15:04:05+07:00)",
'todate': "Filter by to date (e.g. 2016-01-02T15:04:05+07:00)",
'fromclosedate': "Filter by the incident close date, from",
'toclosedate': "Filter by the incident to close date, to",
'fromduedate': "Filter by SLA due date, from",
'toduedate': "Filter by SLA due date, to",
'level': "Filter by Severity. Unknown (0), Informational (0.5), Low (1), Medium (2), High (3), Critical (4)",
'investigation': "",
'owner': "Filter by incident owners",
'details': "Filter by incident details",
'type': "Filter by incident type",
'query': "Use free form query (use Lucene syntax) as filter. All other filters will be ignored when this filter is used",
'searchInNotIndexed': "Also search for incidents that have not yet been indexed",
'populateFields': "A comma-separated list of fields and custom fields in the object to populate"
}
def main():
try:
workspace = demisto.args().get("workspace", "")
argdesc = demisto.args().get("parameters", "")
if workspace == "" or argdesc == "":
return
thread_uuid = str(uuid.uuid4())
execute_command("anyllm-workspace-thread-new", {
'workspace': workspace,
'thread': thread_uuid
})
execute_command('anyllm-workspace-thread-chat', {
'message': f"Here are the \"getIncidents\" command's arguments as a python dictionary in JSON: {json.dumps(searchArgs)}. Use this to create a python dictionary for the arguments to \"getIncidents\" ",
'mode': 'chat',
'workspace': workspace,
'thread': thread_uuid
})
argjson = execute_command('anyllm-workspace-thread-chat', {
'message': f"Use the following data: \"{argdesc}\" as values to create the python dictionary for the \"getIncidents\" command's arguments. Leave out any missing values. Return only the arguments python dictionary as a JSON string",
'mode': 'chat',
'workspace': workspace,
'thread': thread_uuid
})['textResponse']
argsdict = json.loads(argjson)
results = execute_command("getIncidents", argsdict)
execute_command("anyllm-workspace-thread-delete", {
'workspace': workspace,
'thread': thread_uuid
})
return_results(CommandResults(readable_output=results))
except Exception as ex:
demisto.error(traceback.format_exc())
if thread_uuid != "":
execute_command("anyllm-workspace-thread-delete", {
'workspace': workspace,
'thread': thread_uuid
})
return_error(f'Failed to execute SearchIncidentsNLC. Error: {ex}')
if __name__ in ('__main__', '__builtin__', 'builtins'):
main()