Auto-renew TLS Certificates with cert-manager
In my last few posts, I touched on the reasons why I chose Traefik as the Ingress Controller of choice for Kubernetes and how I deployed and configured Traefik on my cluster to manage my Let's Encrypt TLS certificates.
In this post, I'll introduce how I offloaded TLS certificate management from Traefik to another component and why I did it.
Before we proceed, you may be happy to know that cert-manager is multi-arch, meaning it will work on both amd64, arm64 and armv7 machines. So yes, this works on your Raspberry Pi machines!
What is cert-manager?
cert-manager
does exactly what its name says, manage TLS certificates, but in this case, it is built specifically to manage TLS certificates for applications deployed in a Kubernetes environment. It provides useful entity abstractions for TLS certificates in the form of Custom Resource Definitions as a common interface that is significantly more intuitive for the user.
For instance, a TLS certificate can ultimately be reduced to 2 entities: the Issuer
of the certificate, and the Certificate
itself.
cert-manager provides the exact same abstractions so the user only needs to configure theIssuer
and theCertificate
and it handles the rest for you.
Really neat eh?
Why use cert-manager?
You may already be thinking:
Why not use Traefik to manage my TLS certificates as demonstrated in the previous posts?
The keen-eyed readers might have already caught on to this, but for the benefit of those who haven't, here's the main reason why I'm using cert-manager:
Offloading TLS certificate management from Traefik to cert-manager would allow for high-availability Traefik deployments
Enablement of HA Traefik
The main obstacle preventing high-availability deployments of Traefik as outlined previously, is the fact that Traefik has to maintain the state of the Let's Encrypt ACME challenges.
When starting multiple replicas of Traefik a few seconds apart from each other, each Traefik instance does not have knowledge of the other Traefik instances. Therefore, in the event that a certificate renewal is required, each instance attempts to renew the certificate. As a result, they overwrite each other's certificate in the .json
file and only the last certificate issued is used.
With cert-manager however, Traefik can now obey the single-responsibility principle of software engineering and just focus on its role as a reverse proxy as it no longer has to manage TLS certificates. TLS certificates are read from the Kubernetes Secret where cert-manager stores them and used as-is to encrypt traffic between the internet and your cluster. This is arguable the greatest benefit of using cert-manager over the built-in certificate management in Traefik.
Deprecation of kube-lego
kube-lego
, the framework used by Traefik's TLS certificate renewal feature, has been deprecated and has had no releases in the last couple of years, with the last release in Jun 27, 2018. This is normally not such a huge concern with backend frameworks, but when it involves a security framework that manages your website's identity and thus authenticity, an eternity has already came and gone by.
Incidentally, kube-lego is developed by Jetstack, which is also the developer of cert-manager. In fact, cert-manager was developed as a spiritual successor to kube-lego, so as an additional assurance, users can expect the same quality and reliability as they did when using kube-lego.
Compatibility with multiple providers
Traefik's built-in certificate management only supports obtain certificates from Let's Encrypt via the ACME protocol. For most people Let's Encrypt certificates may already be sufficient, but this may not be ideal when you are:
- Using a private CA in an air-gapped environment
- Using managed CAs in cloud providers such as AWS
- Using self-signed certificates
- Using other ACME providers
- Feeling paranoid about using free CAs like Let's Encrypt
Surprising as it may be, the last point is a legit concern for enterprise users where responsibility in the event of data breaches needs to be clearly defined.
i.e. people need to know who to blame other than themselves.
The most interesting design decision here is highlighted in point 4. cert-manager does not integrate specifically with Let's Encrypt but more generally the ACME protocol. This means cert-manager would work with any TLS certificate provider that implements the ACME protocol.
This serves as the foundations to encourage more TLS certificate providers to move towards automated certificate management. This is one point that I find really inspiring, as it aligns with Let's Encrypt's vision of a more secure internet with more sites moving towards short-lived, automatically managed TLS certificates.
Some caveats
Of course, all good things come with caveats. This is an important point to consider if you have been using Traefik's Let's Encrypt DNS-01 validation flow. cert-manager only supports a handful of DNS providers, namely:
- Akamai
- AzureDNS
- Cloudflare
- DigitalOcean
- Google CloudDNS
- Route53
Before you stop reading, there is a workaround if your DNS provider is not listed here.
Keen-eyed readers might have already discerned that the list consists of only major Cloud and CDN providers. If you're wondering if support for your domain registrar's DNS will be added in future, you might be disappointed to hear that the list of DNS providers have not changed in the 2 years or so that I've been monitoring the project.
Personally, I was using Namecheap DNS but thankfully, I found a way to get around the lack of Namecheap DNS support. Cloudflare has a free tier for personal sites through which you can not just use their DNS, but also leverage their performance and security features of their CDN. (Disclaimer: I'm not sponsored by Cloudflare in any form).
To use Cloudflare as my DNS provider, I created a Cloudflare account and set my NS records on my domain registrar's DNS servers to point to Cloudflare DNS. From then, your domain's DNS can now be managed in Cloudflare, which has a superior UI/UX as compared to most domain registrars in my opinion.
Detailed instructions and a step-by-step tutorial will also be shown during the onboarding flow in Cloudflare should you be interested.
Deployment
Deploying cert-manager is as simple as running a single command:
This command installs the latest version of cert-manager along with its associated components (cert-manager, cainjector and webhook) and Custom Resource Definitions into the cert-manager
namespace in your cluster. You may verify the yaml manifests by opening the URL in the apply command. Further details can be found in cert-manager's excellent documentation found here.
For multi-tenant users, namespace choice here is important, especially so if your cluster already has a cert-manager installation administered by the cluster owner or some other tenant of the cluster.
Configuration
Here comes the exciting part: How you can integrate cert-manager with Traefik and Let's Encrypt.
Choosing between ClusterIssuer and Issuer
cert-manager provides 2 different Custom Resource Definitions as abstractions for TLS certificate issuers; ClusterIssuer
and Issuer
. They are virtually identical with only one key difference: Namespace scoping.
ClusterIssuer
allows the configured certificate issuer to issue certificates to all namespaces in the clusterIssuer
only allows the configured issuer to issue certificates to apps within the same namespace it is defined in
If your cluster is single-tenant, meaning you are not concerned about apps from multiple namespaces requesting certificates from the same CA, you should pick the ClusterIssuer
.
If your cluster is multi-tenant, whereby the same cluster hosts apps that serve multiple domain names on separate accounts, you should pick the Issuer
. You wouldn't want an unrelated app requesting for a certificate for a subdomain that you own, overriding your certificate or worse, impersonating your app to collect sensitive data from your users.
Create Let's Encrypt staging Issuer
To avoid spamming the production Let's Encrypt server with certificate requests during testing and thereby hitting your rate-limits prematurely, you are highly recommended to set up the Let's Encrypt staging issuer first.
Modify the above manifest, replacing the following values with your own:
kind
: Your choice betweenClusterIssuer
orIssuer
.spec.acme.email
: Your Let's Encrypt account email address. If you have requested certificates from Let's Encrypt before, this is the same email address you used to register a private key with them.
If you're new to Let's Encrypt, you may fill in any email address that you own.spec.acme.solvers[0].dns01.cloudflare.email
: Your Cloudflare email address that you use to log into Cloudflare.
Create Cloudflare API Token Secret
Next, create a Secret to store your Cloudflare API token.
The API token can be generated from Cloudflare under My Profile > API Token tab. For the new API token, assign the following Permissions on the respective Zone Resources:
Permissions:
Zone - DNS - Edit
Zone - Zone - Read
Zone Resources:
Include - All Zones
If you are concerned about giving cert-manager access to all DNS Zones (i.e. domain names), you may restrict it to only the DNS Zones that you'd like cert-manager to manage TLS certificates for.
Create test Certificate with staging Issuer
In order not to consume precious rate limit quota from the Let's Encrypt production server, we should test our configuration to make sure that it is working by requesting a Certificate from the staging server.
Modify the above manifest, replacing the following values with your own:
metadata.name
: Your domain name with hyphens replacing dots.
You may specify any other name for this Certificate resource, but it should be indicative of the domain(s) it covers and the nature of the issuer.
For instance, I specifiedikarus-sg-staging
.metadata.namespace
: Namespace of the app(s) using this certificate.spec.secretName
: Name of the target secret to store requested certificate in.
Once again, this should be indicative of the domain(s) covered and the issuer nature.
For instance, I specifiedtls-cert-ikarus-sg-staging
spec.dnsNames
: DNS names to request the certificate for. Usually this is set to a single element, specifying the subdomain that the app should be reachable at.
Explanation of all possible configuration parameters can be found in their official docs here.
Use certificate in app Ingress or IngressRoute
Finally, configure the app Ingress
or IngressRoute
, depending on your traefik configuration provider, to read the certificate from the target Secret.
Personally, I'm using IngressRoute
, but fortunately, the configuration key tls
is the same across the two resource kinds, so you may reference my configuration here.
Ignore all other elements of this IngressRoute
manifest, you should use your existing manifest, only look at the tls
key configured here.
Set spec.tls.secretName
to the same value defined in the spec.secretName
in the Certificate
CRD defined above. This instructs traefik to pick up the certificate from the secret specified here. At the risk of sounding like a broken record, the namespace
of this Ingress
/IngressRoute
should be the same as that of the Certificate
.
Test staging certificate
The certificate request process should have already completed. If not, check the progress by running the following command, and looking at the status
field.
If the status shows something like that shown above, your certificate should be ready. If it doesn't double check your configurations and make sure the configuration parameters described above are changed appropriately.
If all's well, open your browser, and navigate to the subdomain that you recently requested the certificate for. Inspect the untrusted certificate by clicking on the lock icon to the left, and verify that the issuer is Let's Encrypt Staging and not Traefik default issuer.
Create Let's Encrypt production equivalents
Repeat the above steps to create the Let's Encrypt production equivalents of ClusterIssuer
/Issuer
and Certificate
, with the following differences:
- Replace
spec.acme.server
inClusterIssuer
/Issuer
with:https://acme-v02.api.letsencrypt.org/directory
- Remove any
-staging
suffixes in the identifiers to avoid overwriting the staging equivalents.
Switch to newly created Certificate
In your existing app Ingress
/IngressRoute
, set spec.tls.secretName
to the same value defined in the spec.secretName
in the production Certificate
CRD. Follow the above steps to check on the progress of certificate request and to verify the certificate.
Profit!
If all has gone well so far, congratulations! You now have a working deployment of cert-manager, managing your TLS certificates.
You may now delete the staging Certificate
or even its Issuer
/ClusterIssuer
.
Also don't forget to remove TLS related configuration from Traefik so that it no longer attempts to renew and thus override certificates requested by cert-manager.
What's next
I'm planning to write up 2 posts which are forks of the same thread, which might prove to be interesting for those who are infra-minimalists like myself.
- Managing wildcard TLS certificates for multiple namespaces with cert-manager.
- Configuring Traefik v2 for high-availability.
Stay tuned for more interesting content in the upcoming weeks!