- Access exclusive content
- Connect with peers
- Share your expertise
- Find support resources
05-10-2021 02:23 PM
This is less a question needing an answer and more a "so you don't have to go through my pain" type of post.
I was having a problem with SpeedTest.net where the suggestion of a server for testing was taking over a minute to appear after the rest of the page had completely loaded. With some poking about, it turned out the issue was that our firewall's app definitions had changed and no longer detected some of the URLs as part of the SpeedTest application. Since this load time was causing users to claim "the network is slow" (regardless of the speed test results showing anything but), I had need to get those URLs unblocked from the firewall.
My first thought was to submit an updated application identification rule. But 1) I'm not great at writing those, and 2) it seems part of the problem with the current fingerprinting is that Ookla is rather notorious for modifying the content of their SpeedTest.net handshakes and payloads. So I moved on to "well, let's whitelist the necessary domains & URLs" and " hey, I bet MineMeld can handle the list collection and distribution for me." I was right, but there weren't any good prototypes out there and I had to play trial & error making my own.
All of my miners are based on the minemeld.ft.http.HttpFT class. There are 2 domain miners (we'll get to why in a bit) and 2 URL miners (though I think most people won't need either of them as long as they have the domain ones setup). The miners are all using SpeedTest.net's static server list from http://c.speedtest.net/speedtest-servers-static.php .
It is important to note that this PHP page lists the 100 "nearest" testing hosts to the requestor, which will be the MineMeld server and not necessarily any clients that are consuming your feed. You should keep this in mind if you are providing intelligence feeds to clients across a geographical diversity. As all of my clients are within 300 miles of my MineMeld server's PoP, it is good enough for me.
First up is the main domain miner. You'll notice I went a bit capture-crazy and got all of the fields. That allows you to use filters later on if you want (I don't, but I like options). Just use any existing minemeld.ft.http.HttpFT miner as a source (the DShield Blocklist was my favorite) and make sure to set the Indicator Type to domain.
And use this for your config
age_out:
default: null
interval: last_seen+7200
sudden_death: true
attributes:
confidence: 100
share_level: green
type: domain
fields:
city:
regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
transform: \4
countryCode:
regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
transform: \6
countryName:
regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
transform: \5
hostname:
regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
transform: \9
id:
regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
transform: \8
lat:
regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
transform: \2
long:
regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
transform: \3
sponsor:
regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
transform: \7
url:
regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
transform: \1
ignore_regex: settings|servers
indicator:
regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
transform: \9
interval: 3600
source_name: speedtest.hosts
url: http://c.speedtest.net/speedtest-servers-static.php
The indicator comes from the last element listed by SpeedTest, the "host" value (just with the port number stripped off).
The second domain miner I referred to as the SpeedTest_Domain-Reference_DNS miner. That is because in addition to the list of hostnames generated by the above, Ookla also uses a hostname kept underneath their own domain (I supposed in order to isolate from the test host's DNS server being an issue -- I get it, but it'd also be nice if they actually listed this in the static servers listing). So what you want is a list of the hostnames, appended with ".prod.hosts.ooklaserver.net" To do that we will make a rather minor (but unbelievably important) alteration to the Indicator section of the config
Open the SpeedTest_Domain prototype you just created, and use it to create another new one. Then modify the Indicator section of the config so that the transform attribute matches the one below (append the Ookla DNS tree to the existing name).
indicator:
regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
transform: \9.prod.hosts.ooklaserver.net
With those two prototypes, you'll have everything you need for 99% of the detection systems out there.
You'd create the miners, a domain aggregator, and a quick & easy output for consumption into your whitelist system. Import-append the following into your config (after you've created the two prototypes above)
nodes:
wl_SpeedTest_Domain:
inputs: []
output: true
prototype: minemeldlocal.SpeedTest_Domain
SpeedTest_Domain-miner:
inputs: []
output: true
prototype: minemeldlocal.SpeedTest_Domain
SpeedTest_Domain-Reference_DNS-miner:
inputs: []
output: true
prototype: minemeldlocal.SpeedTest_Domain-Reference_DNS
SpeedTest_Domain-aggregator:
inputs:
- SpeedTest_Domain-miner
- SpeedTest_Domain-Reference_DNS-miner
output: true
prototype: stdlib.aggregatorDomain
SpeedTest_Domains:
inputs:
- SpeedTest_Domain-aggregator
output: false
prototype: minemeldlocal.feed-all
This will create 3 miners, 1 processor, and 1 output:
In your firewall/blocking system, you'd want to setup a DNS feed (mine only updates every 24 hours, but these miners do a check every hour) to point to https://<your MineMeld server>/feeds/SpeedTest_Domains and then put that into your DNS whitelist.
You'd also want to create a URL-typed feed (and this is the tricky where I found that most of us only need the domain miners) that uses the PANOS output transforms on the SpeedTest_Domains output; this would be done with the v=panosurl query string parameter. https://<your MineMeld server>/feeds/SpeedTest_Domains?v=panosurl This will let you use the domains as a base-URL for your URL filter. As this works for most edge-protection/filter devices, it helps keep your MineMeld nodes down while still giving the desired outcomes.
If those outputs have your setup working the way you'd like, you can stop here. (This is where my firewall config stopped, as it was all I needed).
If you need the full URLs for your firewall/filter, then you'd need to make a few more miner prototypes (and miners). For each of these, I start by looking at one of my existing SpeedTest_Domain prototypes, and make a new one with it as the template. Make sure to change your indicator type to URL (both in the prototype drop-down box, and in the textual config) -- the rest of the changes are all in the indicator selection.
The full URL as given by SpeedTest.net (only showing the relevant parts of the prototype config)
attributes:
type: URL
indicator:
regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
transform: \1
A variant of the same to convert the previous into an https URL (it seems that the feed from SpeedTest all report as non-SSL even though they do actually use https) (this only selects plain http and converts them to https, should be used in combination with the above in case they ever actually do start publishing https URLs in their feed)
attributes:
type: URL
indicator:
regex: url="http:\/\/(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
transform: https://\1
Another variant of the URL miner that only keeps the protocol & host & port (some firewalls want only that and no paths)
attributes:
type: URL
indicator:
regex: url="(\w*:\/\/[a-zA-Z0-9\-._~%]*(?::\d{0,5})?\/).*"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
transform: \1
And one last one that converts http to https in the protocol+host+port only collection. (like the previous http-to-https converting miner, this one should be used with its in-the-native-form counterpart from directly above so that you have a complete picture)
attributes:
type: URL
indicator:
regex: url="http:\/\/([a-zA-Z0-9\-._~%]*(?::\d{0,5})?\/).*"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
transform: https://\1
Again, most folks can actually get by with only the domain-based miners and associated outputs; just remember to have your pseudo-URL feed on your filtering device use the ?v=panosurl query parameter on the SpeedTest_Domains feed, and most filtering devices will accept it as a URL.
01-31-2024 07:11 AM
Awesome job with the explanation!
Unfortunately the PHP source that you used is not updated, but I've found someone who publishes a csv on github:
https://gist.github.com/ofou/654efe67e173a6bff5c64ba26c09d058
Did you by chance update your miners with a difference source after you made the original post?
Thanks!
01-31-2024 11:14 AM - edited 01-31-2024 11:31 AM
EDIT: NVM not working properly yet, will add tomorrow
I did a much poorer job than you, using just two miners and outputs for IP and URL which is what I use, but it does the job. Here's the config for the miners, shall anyone need:
age_out:
default: null
interval: last_seen+7200
sudden_death: true
attributes:
confidence: 100
share_level: green
type: IPv4
indicator:
regex: (\b[\w\.]+\.[0-9]{1,}\b)
transform: \1
interval: 86400
source_name: speedtest.hosts.ip
url: https://gist.githubusercontent.com/ofou/654efe67e173a6bff5c64ba26c09d058/raw/f1c659864e7c24c33233c3e7ebcfc605a053e900/servers.csv
age_out:
default: null
interval: last_seen+7200
sudden_death: true
attributes:
confidence: 100
share_level: green
type: URL
indicator:
regex: (\b[\w\.]+\.[A-Za-z\-\_]{2,}\b)
transform: \1
interval: 86400
source_name: speedtest.hosts.url
url: https://gist.githubusercontent.com/ofou/654efe67e173a6bff5c64ba26c09d058/raw/f1c659864e7c24c33233c3e7ebcfc605a053e900/servers.csv
I then use a custom URL category with *.prod.hosts.ooklaserver.net rather then a doubled miner with the transform, and add ?v=panosurl to the minemeld URL list to add the / at the end for safety measure.
Click Accept as Solution to acknowledge that the answer to your question has been provided.
The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!
These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!
The LIVEcommunity thanks you for your participation!