Maximizing Email Security: Insights from a DNS Research Study on SPF, DKIM, and DMARC Records

Introduction

It is popularly known that the present state of the Internet works mostly by leveraging IPv4 and IPv6. Any given source host communicates with a valid destination host using addresses like 172.217.22.14 or 2345:0425:2CA1:0000:0000:0567:5673:23b5. But, since it is practically not possible for users (and developers) to remember these addresses, they heavily depend on DNS (Domain Name System).

Table of Contents

According to a report by NETSCOUT, “DNS water-torture attacks accelerated into 2022 with a 46% increase primarily using UDP query floods”. Briefly, if a client or a server is not able to resolve the domain names (eg. payatu.com) to the corresponding IPv4/IPv6 address, it can not request another client or server for the required resources.

Interestingly, DNS is not only fruitful to human users, but to induce flexibility into the deployment, administrators and developers too prefer using domain names in application and network configurations instead of IPv4/IPv6 addresses. For an instance, if an application on Server A communicates with another instance of it on Server B using hardcoded IPv4 addresses in the application, then a change in IPv4 assignment to both the servers would require a change in the application itself. On the other hand, if the applications are hard coded with the domain names like server1.xyz and server2.xyz, and depend on DNS for IPv4 resolution, only a trivial update of the corresponding new IPv4 addresses in the A records for both domain names would be required.

But mapping human readable names to IPv4/IPv6 is not the only thing that DNS does. Before proceeding further to this blog post, understanding of the types of DNS records is required. Out of many types of DNS records, TXT record is simply a text record that can contain arbitrary text strings. There is no defined purpose of TXT records unlike records such as A, CNAME, MX, etc. The same makes TXT records extremely flexible. Popularly, these records are used to verify domain ownership and in best interest of this blog post, to ensure email security and prevent email spoofing.

Email Security

Technically, when an SMTP server sends out an email, it provides MAIL FROM: and RCPT TO: to the receiving SMTP server. From a real world perspective, sending a letter or postcard also requires the same information. But, in a scenario where Alice and Bob are acquaintances, Charlie can pretend (or spoof) to be Alice and use her name, address to post to Bob. The same is applicable for emails because of the way SMTP works which allows Email Spoofing.

SMTP does not have a built-in method for authenticating email addresses. In fact, the email addresses of the sender and recipient can be found in two places within an email – the header and the SMTP envelope. The fields that the recipient sees are included in the email header. However, the SMTP envelope contains the information that servers use to deliver an email to the correct address. But, these fields do not have to match in order for an email to be sent successfully.

Email spoofing is relatively simple because the SMTP envelope never checks the header and the recipient cannot see the information in the envelope.

Fortunately, domain owners can prevent email spoofing by configuring some TXT records on their domain names. Modern SMTP servers check for the existence of these records, basis on which the authenticity and integrity of the received emails is decided. These TXT records are explained below:

SPF

A Sender Policy Framework (SPF) record lists the SMTP servers that are authorized to send emails from a particular domain. This way, if someone made up an email address associated with a domain, it would not be listed on the SPF record and would not pass authorization. A typical SPF record is always configured as a TXT record on the root of domain.tld. An example of an SPF record is given below:

domain.tld

 v=spf1 ip4:22.23.24.25 include:another.tld -all

v=spf1 : Indicates that the record is an SPF record and states that version 1 of SPF is being implemented. There is no other version at this point.
ip4:22.23.24.25 : States the mechanism as IPv4 address of the SMTP server that is authorized to send emails for the domain. Other mechanisms are a, mx, ip6, etc. ptr mechanism is now deprecated as of RFC7208 and thus should not be used.
include:another.tld : Mechanism that includes SPF policy from the given domain (another.tld) while authorizing the sender SMTP server. A maximum of 10 includes are allowed for any sending domain.
all : States the behaviour of the receiving server against all emails sent from a domain that is not authorized.
- -all stands for a hard fail, meaning the emails from an unauthorized SMTP server will simply be rejected.
- ~all stands for a soft fail, meaning the emails from an unauthorized SMTP server will be marked as possible spam.
- +all stands for a pass. Emails from all SMTP servers, irrespective of the SPF policy, will be delivered to the recipient.
- ?all stands for a neutral. Similar to +all, emails from all SMTP servers will be delivered to the recipient. It is just an explicit way of stating that no authorization is asserted.

Security Misconfiguration

Weak Policy – +all and ?all provides no protection against domain spoofing.

DKIM

DomainKeys Identified Mail (DKIM) records use a pair of cryptographic keys for authentication: one that is public and one that is private. The public key is stored in the DKIM record and the private key is configured in the SMTP server that digitally signs the DKIM header. Spoofed emails from a domain with a DKIM record will not be signed with the correct cryptographic keys and will therefore fail authentication.

A typical DKIM record requires the knowledge of a selector, which is a specialized value issued by the SMTP server used by the domain. It is included in the DKIM header to enable an email server to perform the required DKIM lookup in the DNS. Since multiple SMTP servers might authenticate to a single domain, dedicated key pairs are generated by each SMTP server.

For multiple public keys to co-exist for the same domain.tld, individual DKIM records for each public key are configured on selector._domainkey.domain.tld where selector varies depending upon the SMTP server being authenticated. An example of a DKIM record is given below:

google._domainkey.domain.tld

v=DKIM1; k=rsa; p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAq8JxVBMLHZRj1WvIMSHApRY3DraE/EiFiR6IMAlDq9GAnrVy0tDQyBND1G8+1fy5RwssQ9DgfNe7rImwxabWfWxJ1LSmo/DzEdOHOJNQiP/nw7MdmGu+R9hEvBeGRQ" "Amn1jkO46KIw/p2lGvmPSe3+AVD+XyaXZ4vJGTZKFUCnoctAVUyHjSDT7KnEsaiND2rVsDvyisJUAH+EyRfmHSBwfJVHAdJ9oD8cn9NjIun/EHLSIwhCxXmLJlaJeNAFtcGeD2aRGbHaS7M6aTFP+qk4f2ucRx31cyCxbu50CDVfU+d4JkIDNBFDiV+MIpaDFXIf11bGoS08oBBQiyPXgX0wIDAQAB

v=DKIM1 : Indicates that the record is a DKIM record and states that version 1 of DKIM is being implemented. There is no other version at this point.
k=rsa : States the asymmetric cryptographic algorithm used for the key pair generation.
p= : States the public key of the SMTP server that is authenticated to send emails.

Security Misconfiguration

Small Key Size – The public key configured under a DKIM record should at least be 1024 bits long. A smaller size of key is considered theoretically insecure.

DMARC

Domain-based Message Authentication Reporting and Conformance (DMARC) records contain DMARC policies, which tell email servers what to do after checking SPF and DKIM records. Domain owners can set rules about whether to block, allow, or deliver messages based on these checks. Because DMARC policies review against other authentication policies and allow domain owners to set more specific rules, these records add another layer of protection against email spoofing. A typical DMARC record is always configured as a TXT record on a fixed sub domain i.e. _dmarc.domain.tld. An example of a DMARC record is given below:

_dmarc.domain.tld

v=DMARC1; p=quarantine; sp=reject; adkim=s; aspf=s;

v=DMARC1 : Indicates that the record is a DMARC record and states that version 1 of DMARC is being implemented. There is no other version at this point.
p= : Indicates the behavior that receiving SMTP servers should consider for emails that fail DKIM and SPF
- quarantine consider emails to be possible spam.
- none allows emails that fail to still go through.
- reject instructs email servers to reject the emails that fail.
sp= : Similar to p but indicates the behavior for subdomains of the organizational domain.
adkim=/aspf= : States the alignment modes of DKIM/SPF checks.
- s means that DKIM/SPF checks are “strict”.
- r means that DKIM/SPF checks are “relaxed”.
pct= : States the percentage of suspicious messages that DMARC policy applies to. The default is 100.

Details about more DMARC tags are available here.

Security Misconfiguration

Low PCT Value : An explicit value less than 100 for the pct flag means that there will potentially be malicious emails getting past the DMARC policy.
Weak Policy : An explicit value of none for the p and sp flag means that no protection is enforced.

Together, SPF, DKIM and DMARC function like a background check on email senders, to make sure they really are who they claim to be. Therefore, apart from avoiding all the Security Misconfiguration listed above, it is also important to ensure all the three records co-exist. For an example, DMARC depends upon either DKIM or SPF records and if both are missing or invalid, DMARC becomes ineffective.

Objectives

It is interesting to know about the available security controls that can be implemented in order to avoid email spoofing but, exactly how often and in what way are they being implemented across the Internet? Although, there are some relevant statistics and studies one can find on the Internet addressing the same question, we wanted to do it the Bandit way which motivated researchers at Payatu to dive deep and extract the following objectives:

Prepare a list of as many domain names as possible.
Aggregate NS, MX, SPF, DKIM & DMARC records against a lengthy list of domains.
Store the aggregated records for the purpose of analysis.
Perform exhaustive analysis on the aggregated records to understand the current state of email security.

Approach

More than what was done, what makes a research or a study intereseting is how it was done. In the process to achieve the objectives listed above, the researchers at Payatu did a lot of mistakes and eventually worked around them which took considerable amount of time and effort. To make sure a motivated enthusiast/team gets the desired output from this blog post, and gets some insights into the process of avoiding common mistakes, a comprehensive narrative by the researchers is given below.

Step 1 – Get the Domains

We started off by finding a public list of registered domain names somewhere on the Internet. Soon enough, we came across [Alexa Top 1 Million Sites] list. We did use this list in our initial tests and benchmarkings but we were immediately clear that a million domains are not enough to get the insights we were looking for. ZoneFiles, Domains Monitor and Domains Index are a few of the public database providers we considered. We observed that majority of paid databases had around 300 million root domains on an average. Ultimately, we landed on what appeared to be the World’s single largest Internet domains dataset.

The Using instructions on the repository’s README.md didn’t work for us since it is using Git LFS technology and we were returned with a – This repository is over its data quota. Purchase more data packs to restore access. – error upon pulling the lfs files. But a simple work around is forking the repostory, enabling the Include Git LFS objects in archives option in the Settings, archiving the forked repository and downloading the repository as a zip archive. But still, 1.7 billion domains seemed “too good to be true” and soon enough we realized that the list contained sub domains (sub-subdomains too) as well.

We went a step ahead to use Public Suffix for Go project to fetch out the root domains from these 1.7 billion domains. This came out to be just more than 340 million domains, which is again almost what the paid database providers have.

Step 2 – Get the Records

Getting DNS records for more than 340 million domain names surely required a perfect balance of three key component:

Computer power.
Multiple Internet connections to avoid rate limiting.
Some quering tool to make use of the above two to the fullest of their potential.

We identified two possible options for the hardware and network side of infrastructure required for our use case.

Cloud – Sky is the limit with a cloud infrastructure and so is the billing. We chose AWS initially, splitting the number of domains and distributing them to multiple EC2 t2.micro instances via an S3 bucket. But AWS has a Data Transfer Cost and with the number of DNS queries we wanted to make, that would have accounted for insane bills.
On-Premise – Ultimately we chose to go with an on-premise server with precisely three internet connections on three interfaces.

For the third component, we decided to use ZDNS which is written in Go and after giving this a read, we were convinced to stick to it. But, the most significant decisions for using ZDNS is choosing:

Type of resolution – Recursion or Iteration. We started off with using multiple public recursive resolvers but, ended up being rate limited by a majority of them. This motivated us to use something like Unbound to setup our own recursive resolver. But, that would only makes sense if we could get our recursive resolver to populate it’s cache at an optimal rate over a period of time, which is equivalent to running ZDNS in iterative mode at an optimal rate.

This is when Google’s DNS (8.8.8.8, 8.8.4.4) came to our rescue. According to Google’s documentation, a single IP can query it’s DNS resolvers for up to 1500 times in 1 second (1500 QPS). Multiple internet connections would multiply the number of queries we could run. 2. Number of internet connections – We managed to get 3 ISP(s) onboard with around 100Mbps of upload and download speeds each. This increased our threshold for DNS queries to 4500 QPS.

Having a list of around 340 million domains, we wanted to fetch NS, MX, SPF, DKIM and DMARC records for each of the 340 million domains. But contrary to how simple it sounds, there are some complications to it. Most significant of them being the DKIM record. Knowing that the DKIM record is configured at a sub-subdomain consisting of a variable selector, a static ._domainkey. and the domain.tld (Eg. mail._domainkey.example.org), there isn’t a certain way to know the value of the selector which means we were only left with brute-forcing.

Based on our benchmarking of the Alexa Top 1 Million Sites list, we simply took the top 17 selectors. Using the list of selectors and the list of domains, we generated a final list with all the permutation and combination of {selector}._domainkey.{domain.tld} which essentially means that (340 x 17) million DNS queries were required to be made just to get the DKIM records.

Moreover, ZDNS might have the built-in functionality to fetch the DMARC records, but it does not automatically append _dmarc. to a supplied root domain. This means that to get a DMARC record for domain.tld, ZDNS must explicitly be executed against _dmarc.domain.tld. In our case, we had to make 340 million DMARC queries just to realize that the queries were made to look for a DMARC configured on the root domain instead of _dmarc.domain.tld which still proved to be useful because we got an insight into the number of domains where DMARC is configured at the wrong place.

Step 3 – Store the Data

ZDNS can save all the resulting DNS records in JSON which is trivial to import to a database server. In our case, we directly started using MongoDB because of it’s popularity and compatibility with the tools we used for analysis. Nonetheless, the JSON output from ZDNS requires a significant amount of jq operations to convert it into a simple structure ready for an import to MongoDB. This resulted in a separate collection of records for each type – NS, MX, SPF, DMARC, DKIM. Even after creating the indexes, the aggregation function on all the collections was extremely slow and thankfully was not actually required for the analysis we wanted to perform.

Step 4 – Perform the Analysis

For the purpose of analysis, we went with Python Jupyter Notebook. The only challenge we faced with the analysis was the fact that for the instance where data is queried from MongoDB to be stored in a Panda’s dataframe, everything is stored in the memory. Considering the massive database we had, this proved to be problamatic, due to which we had to figure out a way to define dataframes from MongoDB in batches of 10000 records.

The scripts, programs and files we used for the above steps are available on GitHub.

Results

All the DNS records were queried from public DNS resolvers and thus the following statistical data is subject to some minor errors. All the percentages are approximate, rounded off and are calculated against total number of domains.

Following are the questions that were answered as a part of this study:

Overview

How many root domains were analyzed?

340551922

SPF Specific

How many domains had an SPF record with a valid syntax?

78540985 (~23%)

How many domains had an SPF record using the deprectated PTR mechanism?

1617918 (<1%)

How many domains had an SPF record with a weak policy?

6249763 (~2%)

How many potentially Non-Email sending domains had a hardened SPF record?

14438386 (~4%)

How many domains had an overall secure SPF record?

59969724 (~18%)

DKIM Specific

How many domains had a DKIM record with a valid syntax?

18771756 (~6%)

How many domains had a DKIM record configured to use secure public keys(size < 1024 bits)?

17124162 (~5%)

How many potentially Non-Email sending domains had a hardened DKIM record?

155881 (<1%)

How many domains had an overall secure DKIM record?

17378331 (~5%)

DMARC Specific

How many domains had a DMARC record with a valid syntax?

9784448 (~3%)

How many domains had a DMARC record with a low PCT value?

319315 (<1%)

How many domains had an DMARC record with a weak policy configured against the domain?

5460738 (~2%)

How many domains had an DMARC record with a weak policy configured against the subdomain?

1679561 (<1%)

How many potentially Non-Email sending domains had a hardened DMARC record?

213640 (<1%)

How many domains had an overall secure DMARC record?

3342711 (~1%)

How many domains wrongly configured DMARC records on the root domains instead of ._dmarc.{domain.tld}?

163219 (<1%)

Corelations

How many domains had SPF, DKIM & DMARC records configured securely?

TLD distribution for securely configured domains intended for sending emails.

The list of these domains can be downloaded from here.

How many potentially Non-Email sending domains had hardened SPF, DKIM & DMARC records?

TLD distribution for securely configured domains not intended for sending emails.

The list of these domains can be downloaded from here.

For domains that had DKIM records configured, which selector was the most popular?

selector	count
default	13542317
s1	232195
google	232195
k1	212767
mail	972767
mandrill	814298
selector1	760510
smtpapi	209057
m1	218938
x	1305111
dkim	1302488
mailjet	209724
cm	208359

"default" selector accounted for 13542317 domains (~4%)

For domains with a valid MX record, which was the most popular mail service provider?

provider	count
secureserver.net	12292138
google.com/googlemail.com	9128048
registrar-servers.com	2459089
rzone.de	2152851
ovh.net	2140097
kundenserver.de	1647004

"secureserver.net" mail server accounted for 12292138 domains (~4%)

Conclusion

It is interesting to know how only half a million domains out of 340 million domains have securely implemented SPF, DKIM & DMARC. It can be argued that the rest of the domains might simply not be intended for sending out emails. Still, the rest of the domains must follow the best practices to avoid email spoofing.

For eg, an organization owns domain-a.com and domain-b.com. They use domain-a.com for their email services and domain-b.com for an external web application. Now, domain-b.com can be spoofed to send out phishing emails since there are no SPF, DKIM & DMARC records enforced. But unfortunately, only 38133 domains in this study followed these hardening guidelines. If you own a domain irrespective of whether it is intended for sending out emails or not, make sure the SPF, DKIM & DMARC records are configured according to the best practices.

Best Practices

Email Sending Domains

SPF – Do not use +all or ?all in the policy.
DKIM – Ensure the public key length is greator than or equal to 1024 bits.
DMARC – Make sure neighter p= or sp= is set to none and pct= is either not defined or is set to 100.

Non-Email Sending Domains

It was observed that out of the potential non-email sending domains that had all the three records hardened, majority of them had their DNS zone hosted with Cloudflare. This is because Cloudflare natively provides an easy option to configure these records according to the best security practices where the domain will not be used for sending out emails.

There is also an interesting read explaining the same here. But, in short:

SPF – Since there is no mechanism defined with -all, the SPF policy will always fail.
DKIM – Since there is no public key, the DKIM policy will always fail.
DMARC – Since SPF and DKIM will always fail, DMARC policies will always reject the email.

The Road Ahead

Hope you found this post insightful and interesting. If you want us to answer more questions similar to what are already included in this study (Eg. “How many domains had a DKIM record with a secure key?”), feel free to mention @payatulabs on Twitter asking your questions. We will pick up the best ones and release another wave of insights.

Tags: Email Security

Subscribe to our Newsletter

Services

Products

Who we are

Resources

Tools

Community

Contact Us

Top Openings

Employee Centric Work Culture

Never Stop Learning

Cohere with the Community

Maximizing Email Security: Insights from a DNS Research Study on SPF, DKIM, and DMARC Records

Introduction

Email Security

SPF

Security Misconfiguration

DKIM

Security Misconfiguration

DMARC

Security Misconfiguration

Objectives

Approach

Step 1 – Get the Domains

Step 2 – Get the Records

Step 3 – Store the Data

Step 4 – Perform the Analysis

Results

Overview

How many root domains were analyzed?

SPF Specific

How many domains had an SPF record with a valid syntax?

How many domains had an SPF record using the deprectated PTR mechanism?

How many domains had an SPF record with a weak policy?

How many potentially Non-Email sending domains had a hardened SPF record?

How many domains had an overall secure SPF record?

DKIM Specific

How many domains had a DKIM record with a valid syntax?

How many domains had a DKIM record configured to use secure public keys(size < 1024 bits)?

How many potentially Non-Email sending domains had a hardened DKIM record?

How many domains had an overall secure DKIM record?

DMARC Specific

How many domains had a DMARC record with a valid syntax?

How many domains had a DMARC record with a low PCT value?

How many domains had an DMARC record with a weak policy configured against the domain?

How many domains had an DMARC record with a weak policy configured against the subdomain?

How many potentially Non-Email sending domains had a hardened DMARC record?

How many domains had an overall secure DMARC record?

How many domains wrongly configured DMARC records on the root domains instead of ._dmarc.{domain.tld}?

Corelations

How many domains had SPF, DKIM & DMARC records configured securely?

How many potentially Non-Email sending domains had hardened SPF, DKIM & DMARC records?

For domains that had DKIM records configured, which selector was the most popular?

For domains with a valid MX record, which was the most popular mail service provider?

Conclusion

Best Practices

Email Sending Domains

Non-Email Sending Domains

The Road Ahead

Subscribe to our newsletter

Services

Products

Conference

Resources

About