- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
on 07-07-2018 01:57 AM - edited on 12-14-2021 07:42 AM by jforsythe
Note: Palo Alto Networks made an end-of-life announcement about the MineMeld™ application in AutoFocus™ on August 1, 2021. Some of the below information may be outdated. Please read this article to learn about our recommended migration options.
Although MineMeld was conceived as a threat sharing platform, reality has shown many users are taking advantage of its open and flexible engine to extract dynamic data (not threat indicators) from generic APIs.
All these are examples of MineMeld being used to extract dynamic data from public API's.
Depending on the source, a new class (python code) may be needed to implement the client-side logic of the API we're willing to mine. But, in many case, the already available ready-to-consume "generic classes" could be used instead. This way the user could "mine" its generic API without the need to deep dive into the GitHub project contribution.
There are, basically, three "generic classes" that can be reused in many applications:
The following is the rule of thumb that will let you know if the API you want to extract dynamic data from can be "mined" using MineMeld by providing just a prototype for one of these classess (without providing a single line of code!)
The following sections in this article will teach you how to use these generic classes to mine an example API that provides real-time temperature for four MineMeld-relateed cities in the world:
Format | API URL |
CSV | https://test.minemeld.com/csv |
HTML | https://test.minemeld.com/html |
JSON | https://test.minemeld.com/json |
We will start with CSV because it is, probably, the easiest one between the generic classes. The theory of operations is:
First of all, lets call the demo csv api and analyze the results:
Request ->
GET /csv HTTP/1.1
Host: test.minemeld.com
Response Headers <-
HTTP/2.0 200 OK
content-type: text/csv
content-disposition: attachment; filename="minemeldtest.csv"
content-length: 432
Response Body <-
# Real-Time temperature of MineMeld-related cities in the world.
url,country,region,city,temperature
https://ajuntament.barcelona.cat/turisme/en/,ES,Catalunya,Barcelona,12.24
http://www.turismo.comune.parma.it/en,IT,Emilia-Romagna,Parma,16.03
http://santaclaraca.gov/visitors,US,California,Santa Clara,8.98
We're ready to go to configure our prototype to mine this API with the CSVFT class.
We will use the prototype named "sslabusech.ipblacklist" as our starting point. Just navigate to the config panel, click on the lower right icon (the one with the three lines) to expose the prototype library and click on the sslabusech one.
Cliking on the sslabuse prototype will reveal its configuration as shown in the following picture.
The most important value in the prototype is the class is applies to. In this case, the CSVFT one we want to leverage. Our mission is to create a new prototype and to change its configuration to accomplish our goal to mine the demo CSV API. The following is the set of changes we will introduce:
Simply click on the NEW button and modify the prototype as shown in the following picture.
Please, take a closer look to the fieldname list and realize the first name in our prototype list to be "indicator" (in the CSV body the first field was suggested to be "url" instead). The CSV engine inside the CSVFT class will extract all comma separated values from each line and use the one matching the column named "indicator" as the value containing the indicator we want to extract. Any other fieldname will be extracted and attached as additional attributes to the indicator.
Clicking on OK will store this brand new prototype into the library and the browser will be sent to it. Just change the search field to reveal our csv prototype and then click on it.
Now it is time to clone this prototype into a working node into the MineMeld engine. So just click on the CLONE button, give the new miner node a name and commit the new configuration.
Once the engine restarts you should see a new node in your MineMeld engine with 4 indicators in it. Click on it, then click on its LOG button and, finally, click on any log entry to reveal the indicator details.
As shown in the last picture, the extracted indicators are of URL type and additional attributes like city, region, country and temperature are attached to it.
Other optional configuration parameters supported by the CSVFT class are:
In this section you will be provided with steps needed to use the HTTPFT class to mine dynamic data exposed in the response to a HTTP request (typically text/plain or text/html). If you have not done so, please review the complete process described in the section "Mining a CSV API" to understand concepts like "creating a new prototype", "cloning a prototype as a working node", etc.
To build a new HTTPFT class we first need base prototype that already leverages this class. In this example we will use the prototype named dshield.block as the base.
Let's take a deeper look to the HTML API response to figure out how to generate a valid prototype to accomplish our mission.
Request ->
GET /html HTTP/1.1
Host: test.minemeld.com
Response Headers <-
HTTP/2.0 200 OK
content-type: text/html
content-length: 1626
Response Body <-
<!DOCTYPE html><html><head><link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.1/css/bootstrap.min.css" integrity="sha384-WskhaSGFgHYWDcbwN70/dfYBj47jz9qbsMId/iRN3ewGhXQFZCSftd1LZCfmhktB" crossorigin="anonymous"><title>Real-Time temperature of MineMeld-related cities in the world.</title></head><body><div class="container"><table class="table-striped">
<tr><td><code>city</code></td><td><code>country</code></td><td><code>region</code></td><td><code>temperature</code></td><td><code>url</code></td></tr>
<tr><td><code class="small">Barcelona</code></td><td><code class="small">ES</code></td><td><code class="small">Catalunya</code></td><td><code class="small">12.24</code></td><td><code class="small">https://ajuntament.barcelona.cat/turisme/en/</code></td></tr>
<tr><td><code class="small">Parma</code></td><td><code class="small">IT</code></td><td><code class="small">Emilia-Romagna</code></td><td><code class="small">16.03</code></td><td><code class="small">http://www.turismo.comune.parma.it/en</code></td></tr>
<tr><td><code class="small">Santa Clara</code></td><td><code class="small">US</code></td><td><code class="small">California</code></td><td><code class="small">8.98</code></td><td><code class="small">http://santaclaraca.gov/visitors</code></td></tr>
</table></div></body></html>
So, what do we have here? A HTML table whose rows are provided in individual file lines and with each value in its own table cell.
First of all we have to get rid of all lines not belonging to table rows. We can achieve this with the ignore_regex class configuration parameter.
ignore_regex: ^(?!<tr><td>)
Next, we need a regex pattern to extract and transform our values from each line. The HTTPFT class leverages Python's re module and accepts configuration parameters both for the indicator itself and any additional attribute. Any Regular Expression strategy will be valid. We will use the following one in this example:
(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)
It is a large expression with 10 capturing groups. The first capturing group (\1) extracts the first cell and the second capturing group (\2) the value inside that given cell. Group 3 extracts cell number 2 and group 4 the value inside that second cell. And so on.
As the indicator (the URL) is in the first cell, then the corresponding configuration to achieve our goal must be:
indicator:
regex: '(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)'
transform: \2
For the remaining attributes we can leverage the same regular expression but with different transformations.
fields:
country:
regex: '(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)'
transform: \4
region:
regex: '(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)'
transform: \6
city:
regex: '(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)'
transform: \8
temperature:
regex: '(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)(<td><code class="small">([^<]+)<\/code><\/td>)'
transform: \10
Combine all these configuration parameters into our desired HTTPFT class prototype as shown in the following picture.
And, finally, clone this prototype as a working (miner) node into the MineMeld engine to verify it is working as expected.
Other optional configuration parameters supported by the HTTPFT class are:
(Please review the complete process described in the section "Mining a CSV API" to understand concepts like "creating a new prototype", "cloning a prototype as a working node", etc.)
The SimpleJSON class features a JMESPath engine to process any JSON document returned by the API call. Take your time to visit http://jmespath.org/ and follow the tutorial to be able to understand all concepts we're going to share in this section.
The following is the theory of operations for JSON miners:
Let's take a closer look to the test JSON API we're going to mine:
Request ->
GET /json HTTP/1.1
Host: test.minemeld.com
Response Headers <-
HTTP/2.0 200 OK
content-type: application/json
content-length: 861
Response Body <-
{ "description": "Real-Time temperature of MineMeld-related cities in the world.", "result": [ { "url": "https://ajuntament.barcelona.cat/turisme/en/", "country": "ES", "region": "Catalunya", "city": "Barcelona", "temperature": 12.24 }, { "url": "http://www.turismo.comune.parma.it/en", "country": "IT", "region": "Emilia-Romagna", "city": "Parma", "temperature": 16.03 }, { "url": "http://santaclaraca.gov/visitors", "country": "US", "region": "California", "city": "Santa Clara", "temperature": 8.98 } ] }
The JSON document looks quite easy and with a element (result) that already provides us the needed array of objects. So, our JMESPath extractor will be:
extractor: result
You can check the expressión in the JMESPath site to verify this expression will return the following array of objects
[ { "url": "https://weather.yahoo.com/country/state/city-772777/", "country": "Spain",
"region": "Catalunya", "city": "Sant Cugat Del Valles", "temperature": "22" }, { "url": "https://weather.yahoo.com/country/state/city-719975/", "country": "Italy",
"region": "Emilia-Romagna", "city": "Parma", "temperature": "21" }, { "url": "https://weather.yahoo.com/country/state/city-2488836/", "country": "United States",
"region": "California", "city": "Santa Clara", "temperature": "20" } ]
At this point we just need to identify the object attributes that contains 1) the indicator itself and 2) any additional attribute we want to attach to the indicator. In our case, the configuration for it will be:
indicator: url
fields:
- country
- region
- city
- temperature
It is time to put all these configuration statements into a SimpleJSON class prototype. We can use, for example, the aws.AMAZON standard library prototype as the base.
Did you noticed the "json" prefix in all extracted additional attributes? You can control that and a few other behaviors of the class with the following optional class configuration elements:
Wondering why would anyone extract additional attributes from the feed and not just the indicator value?
Lets's imagine we want to provide two feeds with Yahoo Weather urls of cities:
We can achieve that with the input and output filtering capabilities of the MineMeld engine nodes. Let me share with you a couple of screenshots of the prototypes that will do this job:
Clone each one of these two prototypes as working output nodes and connect their inputs to the JSON miner you created. That should build a graph like the one shown in the picture.
At the time of writing this article, only one of the four cities in the feed is over 30°C.
GET /feeds/no-beach-time-yet HTTP/1.1
...
http://www.turismo.comune.parma.it/en
https://ajuntament.barcelona.cat/turisme/en/
----
GET /feeds/time-to-beach HTTP/1.1
...
http://santaclaraca.gov/visitors
Hey!
Just a quick question that maybe I didn't quite understand. If the HTML external list I'm going to use for my prototype is protected by a simple user/password combination how do I tell MineMeld to authenticate before extracting the info?
https://test.minemeld.com/json ->
{"message": "Internal server error"}
I don't want to open a support ticket but if anyone is monitoring this thread and they could investigate the test.minemeld.com instance
Hi @Michael_D ,
thanks for letting us know the example was not working anymore. Yahoo discontinued his weather API. Just the example to another provider.
Took me a while to get this figured out, but @lachlanjholmes on the PA community slack group had the answer and want to credit him.
I'm posting it here to help others in case who may have a similar issue.
Parsing a JSON feed that contains an array of ip addresses, like the datadog feeds:
{
"version": 40,
"modified": "2021-04-02-17-00-00",
"agents": {
"prefixes_ipv4": [
"3.228.26.0/25",
"3.228.26.128/25",
"3.228.27.0/25",
"3.233.144.0/20"
],
"prefixes_ipv6": [
"2600:1f18:24e6:b900::/56",
"2600:1f18:63f7:b900::/56"
]
}
}
you need to use the extractor:
extractor: agents.prefixes_ipv4[].{ip:@}
The extractor get's the prefixes_ipv4[] array, then the {ip:@} formats that into an array of objects like the following:
[ { "ip": "3.228.26.0/25" }, { "ip": "3.228.26.128/25" }, { "ip": "3.228.27.0/25" }, { "ip": "3.233.144.0/20" } ]
Then it's a simple matter of using
indicator: ip
To extract each ip to the list.