Getting the SpeedTest.net servers with MineMeld

RuscalR · ‎05-10-2021

This is less a question needing an answer and more a "so you don't have to go through my pain" type of post.

I was having a problem with SpeedTest.net where the suggestion of a server for testing was taking over a minute to appear after the rest of the page had completely loaded. With some poking about, it turned out the issue was that our firewall's app definitions had changed and no longer detected some of the URLs as part of the SpeedTest application. Since this load time was causing users to claim "the network is slow" (regardless of the speed test results showing anything but), I had need to get those URLs unblocked from the firewall.

My first thought was to submit an updated application identification rule. But 1) I'm not great at writing those, and 2) it seems part of the problem with the current fingerprinting is that Ookla is rather notorious for modifying the content of their SpeedTest.net handshakes and payloads. So I moved on to "well, let's whitelist the necessary domains & URLs" and " hey, I bet MineMeld can handle the list collection and distribution for me." I was right, but there weren't any good prototypes out there and I had to play trial & error making my own.

All of my miners are based on the minemeld.ft.http.HttpFT class. There are 2 domain miners (we'll get to why in a bit) and 2 URL miners (though I think most people won't need either of them as long as they have the domain ones setup). The miners are all using SpeedTest.net's static server list from http://c.speedtest.net/speedtest-servers-static.php .

It is important to note that this PHP page lists the 100 "nearest" testing hosts to the requestor, which will be the MineMeld server and not necessarily any clients that are consuming your feed. You should keep this in mind if you are providing intelligence feeds to clients across a geographical diversity. As all of my clients are within 300 miles of my MineMeld server's PoP, it is good enough for me.

First up is the main domain miner. You'll notice I went a bit capture-crazy and got all of the fields. That allows you to use filters later on if you want (I don't, but I like options). Just use any existing minemeld.ft.http.HttpFT miner as a source (the DShield Blocklist was my favorite) and make sure to set the Indicator Type to domain.

And use this for your config

age_out:
    default: null
    interval: last_seen+7200
    sudden_death: true
attributes:
    confidence: 100
    share_level: green
    type: domain
fields:
    city:
        regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
        transform: \4
    countryCode:
        regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
        transform: \6
    countryName:
        regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
        transform: \5
    hostname:
        regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
        transform: \9
    id:
        regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
        transform: \8
    lat:
        regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
        transform: \2
    long:
        regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
        transform: \3
    sponsor:
        regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
        transform: \7
    url:
        regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
        transform: \1
ignore_regex: settings|servers
indicator:
    regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
    transform: \9
interval: 3600
source_name: speedtest.hosts
url: http://c.speedtest.net/speedtest-servers-static.php

The indicator comes from the last element listed by SpeedTest, the "host" value (just with the port number stripped off).

The second domain miner I referred to as the SpeedTest_Domain-Reference_DNS miner. That is because in addition to the list of hostnames generated by the above, Ookla also uses a hostname kept underneath their own domain (I supposed in order to isolate from the test host's DNS server being an issue -- I get it, but it'd also be nice if they actually listed this in the static servers listing). So what you want is a list of the hostnames, appended with ".prod.hosts.ooklaserver.net" To do that we will make a rather minor (but unbelievably important) alteration to the Indicator section of the config

Open the SpeedTest_Domain prototype you just created, and use it to create another new one. Then modify the Indicator section of the config so that the transform attribute matches the one below (append the Ookla DNS tree to the existing name).

indicator:
    regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
    transform: \9.prod.hosts.ooklaserver.net

With those two prototypes, you'll have everything you need for 99% of the detection systems out there.

You'd create the miners, a domain aggregator, and a quick & easy output for consumption into your whitelist system. Import-append the following into your config (after you've created the two prototypes above)

nodes:
  wl_SpeedTest_Domain:
    inputs: []
    output: true
    prototype: minemeldlocal.SpeedTest_Domain
  SpeedTest_Domain-miner:
    inputs: []
    output: true
    prototype: minemeldlocal.SpeedTest_Domain
  SpeedTest_Domain-Reference_DNS-miner:
    inputs: []
    output: true
    prototype: minemeldlocal.SpeedTest_Domain-Reference_DNS
  SpeedTest_Domain-aggregator:
    inputs:
      - SpeedTest_Domain-miner
      - SpeedTest_Domain-Reference_DNS-miner
    output: true
    prototype: stdlib.aggregatorDomain
  SpeedTest_Domains:
    inputs:
      - SpeedTest_Domain-aggregator
    output: false
    prototype: minemeldlocal.feed-all

This will create 3 miners, 1 processor, and 1 output:

Miners
- wl_SpeedTest_Domain
  Optional
  Useful if you have other domain feeds that you are processing. You can pump this whitelist into your processor and it'll remove the SpeedTest.net domains from those feeds. That way you don't have a blocklist forcing a block on you even if you're using this to create a "whitelist" feed.
  EDIT -- Technically, you'd also want a wl_SpeedTest_Domain-Reference_DNS as well. My specific use didn't need it, but for completeness...
- SpeedTest_Domain-miner
  -- and --
- SpeedTest_Domain-Reference_DNS-miner
  These two miners, when combined, will list all of the hostnames used by SpeedTest clients within the "local" geography of the MineMeld server.
Processor
- SpeedTest_Domain-aggregator
  Used to aggregate that SpeedTest_Domain-miner and SpeedTest_Domain-Reference_DNS-miner inputs
Output
- SpeedTest_Domains
  Your combined output list of all the active speedtest hostnames.

In your firewall/blocking system, you'd want to setup a DNS feed (mine only updates every 24 hours, but these miners do a check every hour) to point to https://<your MineMeld server>/feeds/SpeedTest_Domains and then put that into your DNS whitelist.

You'd also want to create a URL-typed feed (and this is the tricky where I found that most of us only need the domain miners) that uses the PANOS output transforms on the SpeedTest_Domains output; this would be done with the v=panosurl query string parameter. https://<your MineMeld server>/feeds/SpeedTest_Domains?v=panosurl This will let you use the domains as a base-URL for your URL filter. As this works for most edge-protection/filter devices, it helps keep your MineMeld nodes down while still giving the desired outcomes.

If those outputs have your setup working the way you'd like, you can stop here. (This is where my firewall config stopped, as it was all I needed).

If you need the full URLs for your firewall/filter, then you'd need to make a few more miner prototypes (and miners). For each of these, I start by looking at one of my existing SpeedTest_Domain prototypes, and make a new one with it as the template. Make sure to change your indicator type to URL (both in the prototype drop-down box, and in the textual config) -- the rest of the changes are all in the indicator selection.

The full URL as given by SpeedTest.net (only showing the relevant parts of the prototype config)

attributes:
    type: URL
indicator:
    regex: url="(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
    transform: \1

A variant of the same to convert the previous into an https URL (it seems that the feed from SpeedTest all report as non-SSL even though they do actually use https) (this only selects plain http and converts them to https, should be used in combination with the above in case they ever actually do start publishing https URLs in their feed)

attributes:
    type: URL
indicator:
    regex: url="http:\/\/(.*)"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
    transform: https://\1

Another variant of the URL miner that only keeps the protocol & host & port (some firewalls want only that and no paths)

attributes:
    type: URL
indicator:
    regex: url="(\w*:\/\/[a-zA-Z0-9\-._~%]*(?::\d{0,5})?\/).*"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
    transform: \1

And one last one that converts http to https in the protocol+host+port only collection. (like the previous http-to-https converting miner, this one should be used with its in-the-native-form counterpart from directly above so that you have a complete picture)

attributes:
    type: URL
indicator:
    regex: url="http:\/\/([a-zA-Z0-9\-._~%]*(?::\d{0,5})?\/).*"\s*lat="(.*)"\s*lon="(.*)"\s*name="(.*)"\s*country="(.*)"\s*cc="(.*)"\s*sponsor="(.*)"\s*id="(.*)"\s*host="(.*)(?::\d{0,5})?"
    transform: https://\1

Again, most folks can actually get by with only the domain-based miners and associated outputs; just remember to have your pseudo-URL feed on your filtering device use the ?v=panosurl query parameter on the SpeedTest_Domains feed, and most filtering devices will accept it as a URL.

SomeSuch · ‎01-31-2024

Awesome job with the explanation!

Unfortunately the PHP source that you used is not updated, but I've found someone who publishes a csv on github:

https://gist.github.com/ofou/654efe67e173a6bff5c64ba26c09d058

https://gist.githubusercontent.com/ofou/654efe67e173a6bff5c64ba26c09d058/raw/f1c659864e7c24c33233c3e...

Did you by chance update your miners with a difference source after you made the original post?

Thanks!

SomeSuch · ‎01-31-2024

EDIT: NVM not working properly yet, will add tomorrow

I did a much poorer job than you, using just two miners and outputs for IP and URL which is what I use, but it does the job. Here's the config for the miners, shall anyone need:

age_out:
default: null
interval: last_seen+7200
sudden_death: true
attributes:
confidence: 100
share_level: green
type: IPv4
indicator:
regex: (\b[\w\.]+\.[0-9]{1,}\b)
transform: \1
interval: 86400
source_name: speedtest.hosts.ip
url: https://gist.githubusercontent.com/ofou/654efe67e173a6bff5c64ba26c09d058/raw/f1c659864e7c24c33233c3e7ebcfc605a053e900/servers.csv

age_out:
default: null
interval: last_seen+7200
sudden_death: true
attributes:
confidence: 100
share_level: green
type: URL
indicator:
regex: (\b[\w\.]+\.[A-Za-z\-\_]{2,}\b)
transform: \1
interval: 86400
source_name: speedtest.hosts.url
url: https://gist.githubusercontent.com/ofou/654efe67e173a6bff5c64ba26c09d058/raw/f1c659864e7c24c33233c3e7ebcfc605a053e900/servers.csv

I then use a custom URL category with *.prod.hosts.ooklaserver.net rather then a doubled miner with the transform, and add ?v=panosurl to the minemeld URL list to add the / at the end for safety measure.

Getting the SpeedTest.net servers with MineMeld