Let's Encrypt and DNS over TLS Hell on Android
Issues with Android's Private DNS over TLS via Adguard or PiHole using Let's Encrypt TLS Certificates after the recent Root Certificate expiry.
I run a DNS over TLS server with Adguard Home to enable DNS-level ad-blocking on the go on my Android device via the Private DNS setting. I've had a peaceful 3 years of running the DNS server with zero issues, browsing the internet without fear of being spammed by disruptive ads. This is especially important for me as I have ADHD, and ads are a major source of distraction.
Goodbye halcyon days
However, that changed recently when it just stopped working and all it showed me was Couldn't connect
in my Android device when I entered my DNS server's hostname in the Private DNS provider hostname
field. This coincided with the time I decided to wipe my cluster clean and start over, so I figured that it was probably a misconfiguration on my part.
Fast forward to yesterday, 30 days later, I still haven't managed to find out why my DNS over TLS is not working. All this time, I've had many sleepless nights pondering why it didn't work as I'm the kind of person that cannot rest without getting to the root cause of a problem.
Today, I decided to it give it another go to deep dive into the problem and it was then that I finally found the issue; the expired Root Certificate.
Some backstory
Before we dive into the depths of DNS-over-TLS hell, to help us understand the problem better, let us run through a little background on TLS Certificates, trust chains and what happened so far in the past 6 years since the advent of Let's Encrypt.
Note: I do not claim to be an expert in TLS, and am only sharing the limited knowledge I have on the topic so if you're truly interested, do read up on your own accord.
What is TLS?
Transport Layer Security (abbreviated as TLS) is a protocol designed to provide cryptographic security for communications over a computer network, be it between machines in your local home network, or between your computer and a server on the internet.
Without delving into the specifics of how TLS works and at the risk of oversimplification, TLS can be summed up as a protocol that ensures confidentiality and integrity of data transmitted between 2 machines, the server, and the client, through the use of asymmetric ciphers (Public and Private keys) and other cryptographic algorithms (symmetric ciphers and message digests).
In plain english:
TLS ensures that data transmitted between the server and the client reaches the destination without unauthorized modification (integrity) and that no one other than the intended recipient can read it (confidentiality)
TLS also provides identification of the server (more accurately, the public key of the server) through the use of TLS Certificates which are issued to servers by trusted providers that hold Certificate Authority (abbreviated as CA) Certificates. The possession and subsequent presentation of the TLS Certificate on initial communication uniquely identifies the server as the intended party (and not some other malicious server) when a domain name is accessed.
Just as domain names come and go, TLS Certificates have a limited validity period, after which they must be renewed. This is to ensure that the certificates conform to latest security standards and that the domain is still controlled by the owner of the server.
TLS Certificate Trust Chain
A TLS Certificate is trusted on the basis that it is issued by a trusted provider. During issuance, a new TLS Certificate is created and signed by the trusted provider's CA Certificate. A CA Certificate is in turn trusted because it is signed by a Root Certificate. This establishes the chain of trust of a TLS Certificate.
There are only a handful of Root Certificates in the world (around 200), and companies holding Root Certificates undergo a lot of scrutiny to ensure that they are trustworthy and remain so over time. This is because Root Certificates are very powerful, so much that the entire foundation of the security offered by TLS Certificate relies on their proper use.
Because Root Certificates are so powerful, they are permanently stored offline in a secure physical location, disconnected from the Internet. Throughout the lifetime of a Root Certificate, it is only used a handful of times, mostly (if not always) to sign CA Certificates (intermediate certificates). These intermediate certificates are subsequently used to sign TLS Certificates for issuance.
Establishing trust
To establish trust, a device must at least have a copy of all CA certificates currently active in the internet, stored in the device itself. This entails that all devices across the world (including the one you're using to read this) already has a copy of all CA certificates, the only difference between each device is how outdated the list is.
Just as all TLS certificates expire, CA Certificates also have a validity period albeit longer than typical certificates. This list must then be updated regularly on all devices through OS updates alike to ensure that new CA Certificates are added. Expiry dates are stored in the certificate itself so old CA Certificates need not be removed as they are considered invalid past expiry.
Unreachability of TLS Certificates
TLS Certificates, as good as they sound, have traditionally been out of reach by the masses due to its exorbitant cost. A basic (Domain Validated) TLS Certificate typically costs around US$100 per year in the past. Though that has come down to around US$50 per year in recent times, it is still pretty pricey for use cases such as self-hosted blogs like mine.
Other tiers of TLS Certificates (Organization Validated and Extended Validation) cost more (upwards of US$200 per year) but are unnecessary for typical use and are more applicable for large organizations.
The situation changed when Let's Encrypt entered the game.
What is Let's Encrypt?
Let's Encrypt is a non-profit CA run by the Internet Security Research Group (ISRG) with the aim of securing all websites across the world by offering TLS Certificates free-of-charge. They currently are the largest Certificate Authority in the world with over 2 billion certificates issued and used by over 265 million websites.
ISRG Root X1 and IdenTrust DST Root CA X3
In 2015, Let's Encrypt began operations and started issuing TLS Certificates for free. I was among the early adopters of Let's Encrypt and witnessed its meteoric rise to prominence.
To enable issuance of TLS Certificates, Let's Encrypt created a Root Certificate, ISRG Root X1. Subsequently, they created 4 intermediate CA Certificates Let's Encrypt Authorities X1 thru X4, signed by their Root Certificate.
Because ISRG Root X1 is a Root Certificate that was only created on Jun 4, 2015, all new TLS Certificates, if issued as is at that point in time, will not be recognized by devices that received their final updates to their CA Certificates before Jun 4, 2015. To mitigate this, Let's Encrypt partnered with IdenTrust to cross-sign their 4 intermediate CA Certificates with their Root Certificate, DST Root CA X3.
As the DST Root CA X3 is created on Sep 30, 2000, Let's Encrypt-issued TLS Certificates can be verified by devices last updated in year 2000, enabling these devices to access servers and websites using the newly-issued Let's Encrypt TLS Certificates.
DST Root CA X3 Expiration
The DST Root CA X3 Root Certificate expired on Sep 30 14:01:15 2021 GMT
according to certificate transparency logs provided by crt.sh linked above. This means that all certificates cross-signed by DST Root CA X3 will be invalidated, regardless of whether their expiration date has lapsed.
Unfortunately, the expiration of the Root Certificate also means that devices with outdated versions of the CA Certificates will no longer be able to access websites that use Let's Encrypt TLS Certificates without special configuration that allows trust beyond certificate expiry.
The solution for older Android devices
Let's Encrypt devised a trust chain that allows old Android devices to validate Let's Encrypt TLS Certificates past the expiration of the DST Root CA X3 Root Certificate, with the use of a special cross-sign by IdenTrust's DST Root CA X3. This special cross-sign lasts 3 years, allowing the aforementioned Android devices to continue accessing sites secured by Let's Encrypt TLS Certificates up till 2024.
Putting the plan into action, certificates issued by Let's Encrypt after May 4, 2021 utilized the special cross-sign (in green). Alternatively, with some additional configuration, certificate requestors may request for the alternate chain which uses the ISRG Root X1 as the final certificate (in blue).
Back to the problem at hand
TLS Certificates for my personal domain are issued by Let's Encrypt and I have been relying on the default configuration since time immemorial, after all, as the saying goes:
Don't fix what's not broken
On the evening of Sep 30, 2021, after ensuring that backups are in place, I deleted my entire Kubernetes cluster to start afresh due to performance issues of late.
Notice that the date I decided to refresh my cluster coincides with the expiry date of DST Root CA X3 Root Certificate. The expiry event probably happened a few hours after the cluster was refreshed.
After installing Adguard Home, I realized that all my Android devices can no longer connect to my Private DNS server, as I've already explained.
Debugging DNS over TLS
Debugging DNS over TLS can be a huge pain. After 30 minutes of searching, I found a staggering grand total of 1 tool that can make a proper DNS over TLS query. That tool is kdig
by knot.
On MacOS, kdig
can be installed with:
On Linux, kdig
can be installed with:
With kdig, I made a DNS query to my DNS over TLS server, the output shown below has been redacted with my DNS server's domain replaced with mydnsserver.tld
.
To ensure that the query was indeed made with DNS over TLS, I verified that the port used was indeed 853
.
The output showed a successful DNS query for google.com
and I was even more confused as to why my Android device errored out with a very informative Couldn't connect
message when I tried setting it as my Private DNS server.
Checking the TLS Certificates
The next logical step was to check the TLS Certificates. This can be done with a widely available tool known as openssl
.
On MacOS, openssl
can be installed with:
On Linux, openssl
can be installed with:
To retrieve TLS Certificate information, I ran the following command:
What caught my attention was this line:
i:O = Digital Signature Trust Co., CN = DST Root CA X3
It seems that DST Root CA X3 is still in the trust chain but it is not shown in the DNS over TLS query for some reason. Since I'm aware that this certificate has already expired, I had a feeling that this might have been the issue
The solution
Turns out, after a quick search, that this issue is prevalent among users of Adguard Home running DNS over TLS, with Let's Encrypt certificates requested using the default settings.
The solution was to request certificates that use the "ISRG Root X1" trust chain (shown in blue above) that does not contain the expired DST Root CA X3 certificate.
If you use certbot
to renew your Let's Encrypt TLS Certificates, adding the parameter --preferred-chain="ISRG Root X1"
should fix the issue.
In my case, I use cert-manager
(if you're keen on using it as well, check out my other post) so I created a new ClusterIssuer
resource for this alternate trust chain (with secret fields removed):
I then switched my Certificate
issuer field to this newly-created issuer (again, with secret fields removed):
After deploying the changes waiting for a couple of minutes, the new certificate was issued.
Verifying the trust chain
I verified that DST Root CA X3 is no longer in the trust chain by running the openssl
command again.
Indeed DST Root CA X3 is nowhere to be found in the output of the command, confirming that the new TLS Certificate is now using the newly configured trust chain.
Upon setting the Private DNS setting to my personal DNS Server, my phones were finally able to connect successfully without a hitch.
What I think the problem is
Strangely, at the time of writing, information across the Internet touching on this problem provide only the solution but not the root cause.
In the process of debugging this issue, my findings are as follows:
- Websites seem to work fine with the default trust chain
- DNS over TLS clients such as
kdig
seem to work fine with the default trust chain - DNS over TLS on Android breaks with the default trust chain
- DNS over TLS on Android only works with the trust chain that consists of only valid certificates
- Let's Encrypt was confident enough to set the trust chain with DST Root CA X3 as default
Based on those findings, I'm guessing that the issue lies not with Let's Encrypt's default trust chain (since some 265 million websites are at stake) but with Android's DNS over TLS implementation where it somehow validates all the certificates up the trust chain.
D=0 DNS server certificate
D=1 Let's Encrypt R3
D=2 ISRG Root X1
D=3 DST Root CA X3 <-- Expired
Typically, the trust chain validation should stop on encountering the first valid Root Certificate, but in this case the expired DST Root CA X3 Root Certificate which is one step deeper than the ISRG Root X1 Root Certificate in the chain, likely raised a fatal error when establishing a connection with a DNS over TLS server. As a result, the phone fails to connect to the DNS over TLS server.
Closing thoughts
Self-hosting is a mixed bag of fun and pain and this is a prime example of the pains that a self-hoster deals with.
Nonetheless, the joy is worth the pain, at least for me, and the solving of this problem is yet another victory in my self-hosting journey, against the never-ending torrent of challenges from running servers at home.
To my self-hosting friends out there, keep up the good fight!