How does Expanse know about my website?

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Please sign in to see details of an important advisory in our Customer Advisories area.

How does Expanse know about my website?

L0 Member

I have the following message in the User Agent field of my Apache log:

 

Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: scaninfo@paloaltonetworks.com"

How does Expanse know to scan my website?

Here's why I'm asking:

  • The subdomain is brand new (automatically generated just a few hours ago)
  • The subdomain is not advertised anywhere
  • The only way to get to this Apache is by specifying the exact domain in the Host HTTP header
  • The site sents 'noindex', 'nofollow' and related headers for every request to prevent indexing
  • As far as we can tell, we don't have any malware on our computers.

Just two people know about this subdomain.

So... which software spied on who and how?

The subdomain most likely only appeared in a Firefox browser for one of us before Expanse and a bunch of other scanners started accessing the site

2 REPLIES 2

Cyber Elite
Cyber Elite

@SnappyRcx,

Two things

  1. As soon as you register a public DNS record everyone can find it. There's services out there that actively sell access to newly registered domains for a minimal amount of money that aggregate everything for you; I have to image that PAN both has their own service they've created for this and likely actively buy access to this data said through aggregators as well.
  2. As soon as someone using a PAN firewall (potentially even you) visits an unknown URL the firewall is telling PAN that it needs to be analyzed because it's unknown. So in the event that you have a PAN device in your network it's quite possible it was you telling PAN that it exists and needs to be analyzed. 

L0 Member

@BPry

Option 1 doesn't apply, because the subdomain is resolved via a wildcard DNS resolver. It's not itself present in DNS, and it's a subdomain of a another registered domain.

For Option 2 though, after a some brief brainstorming with my colleague, I think I have a candidate shortlist: either Let's Encrypt, or Certificate Transparency logs. My website uses Let's Encrypt for its SSL. All other variables appear to be controlled for: it's my container, behind my reverse proxy, built on my Docker image, using my PHP code and everything is, to the best of my knowledge, not blabbing to outside services. The only places where the subdomain is known are: Traefik, Let's Encrypt, CT logs. The container doesn't know its subdomain beforehand.

I could validate this hypothesis at some point be also replacing the PHP code with some hello world HTML.

I was just surprised, because this site is only meant for our temporary software development purposes.

  • 1873 Views
  • 2 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!