Buy or Renew

Choosing the Right Metadata for Phishing and Email Incidents

Community Blogs

4 min read

Choosing the Right Metadata for Phishing and Email Incidents.jpg

In this post, I want to take a deeper look into the metadata provided by email. As you may already know, Cortex XSOAR gives you a great tool called Mapping to make sure that certain metadata is stored in the incident fields.

Let’s take a look at the email data.

In order to “make sense” of certain meta-data we need to understand how email communication works.

Let's start with some very basic data points. The good thing is that the email “header” is quite transparent with those data points. Again, I am using Cortex XSOAR’s mapping functional to show you the header information.

‌

Data of an email as seen by XSOAR

In layman's terms, an email will originate somewhere and will be handed over to different systems until it reaches its final destination — just like a traditional letter.

The information “Received” will be the paper trail of this handover between the systems. The servers mentioned in this text will also be in order of the handover. So the first servername and IP are the originating server, indicated as “from” the second “received by” is the destination and should correspond with your email server.

In the example case above “received by” was by smtp.gmail.com, which is expected as the email went into my Gmail account.

Choosing the Right Metadata for Phishing and Email Incidents Example

<BYAPR11MB349523A8E7B5DCC69CA7A61F8CF49@BYAPR11MB3495.namprd11.prod.outlook.com>

In another example we can see that only two hops have been between the two email systems in question, if there would have been more we would see that as well.

The next important thing to recognize is the “message-id”, this is an unique identifier of the email. The one we see above shows the one being assigned by the receiving email server. In this case, lets say someone did send an email to multiple people, we would expect this ID to be the same (if CCed or BCCed). This can be a huge help if you need to delete many of the same emails from different inboxes or need to track them on different systems.

Another data point I like to look into is the Format of the Email, it lets you decide upfront what you can expect from an email. If you hop over to Microsoft (link) you will find a list of all the different Content-types an email can have and what you can expect. As an example:

Content-Type: text‌
‌The text content type is used for message content that is primarily in human-readable text character format. The more complex text content types are defined and identified so that an appropriate tool can be used to display more complex body parts.
Content-Type: image‌
‌The image content type allows standardized image files to be included in messages.

‌

Body types and counter for December 2022

‌Mixed and Multipart messages are basically a combination of more than one type plus additional header fields.

A text/plain email could be considered harmless, or at least less harmful compared to a text/html one, as we can expect that no clickable links (URLs) are present in the email itself.

Before this post gets too long, let's have a look at some personal highlights and ways of getting them out of the data.

My first and favorite is what I call the "First Email Server IP", like we discussed above, this one indicates the origin, the first server which created an email. Especially in a larger campaign that is an IP we would expect to show up frequently and can be a good indicator to look for relations.

‌

The next two are pretty much related, this is the Spam Domain and Top Level domain. To be honest I mainly use those for dashboards and reporting, so I can see any uptick in certain domains.

For the human eye, this email looks like Spam through and through, for a machine and an automated response, not so much. So we need to make sure that we understand the meta data and structure a playbook and methods around it.

Thanks for reading until the end. Be excellent to each other.