Auto-renew TLS Certificates with cert-manager

In my last few posts, I touched on the reasons why I chose Traefik as the Ingress Controller of choice for Kubernetes and how I deployed and configured Traefik on my cluster to manage my Let's Encrypt TLS certificates.

In this post, I'll introduce how I offloaded TLS certificate management from Traefik to another component and why I did it.

Before we proceed, you may be happy to know that cert-manager is multi-arch, meaning it will work on both amd64, arm64 and armv7 machines. So yes, this works on your Raspberry Pi machines!

What is cert-manager?

cert-manager does exactly what its name says, manage TLS certificates, but in this case, it is built specifically to manage TLS certificates for applications deployed in a Kubernetes environment. It provides useful entity abstractions for TLS certificates in the form of Custom Resource Definitions as a common interface that is significantly more intuitive for the user.

For instance, a TLS certificate can ultimately be reduced to 2 entities: the Issuer of the certificate, and the Certificate itself.

cert-manager provides the exact same abstractions so the user only needs to configure the Issuer and the Certificate and it handles the rest for you.

Really neat eh?

Why use cert-manager?

You may already be thinking:

Why not use Traefik to manage my TLS certificates as demonstrated in the previous posts?

The keen-eyed readers might have already caught on to this, but for the benefit of those who haven't, here's the main reason why I'm using cert-manager:

Offloading TLS certificate management from Traefik to cert-manager would allow for high-availability Traefik deployments

Enablement of HA Traefik

The main obstacle preventing high-availability deployments of Traefik as outlined previously, is the fact that Traefik has to maintain the state of the Let's Encrypt ACME challenges.

When starting multiple replicas of Traefik a few seconds apart from each other, each Traefik instance does not have knowledge of the other Traefik instances. Therefore, in the event that a certificate renewal is required, each instance attempts to renew the certificate. As a result, they overwrite each other's certificate in the .json file and only the last certificate issued is used.

With cert-manager however, Traefik can now obey the single-responsibility principle of software engineering and just focus on its role as a reverse proxy as it no longer has to manage TLS certificates. TLS certificates are read from the Kubernetes Secret where cert-manager stores them and used as-is to encrypt traffic between the internet and your cluster. This is arguable the greatest benefit of using cert-manager over the built-in certificate management in Traefik.

Deprecation of kube-lego

kube-lego, the framework used by Traefik's TLS certificate renewal feature, has been deprecated and has had no releases in the last couple of years, with the last release in Jun 27, 2018. This is normally not such a huge concern with backend frameworks, but when it involves a security framework that manages your website's identity and thus authenticity, an eternity has already came and gone by.

Incidentally, kube-lego is developed by Jetstack, which is also the developer of cert-manager. In fact, cert-manager was developed as a spiritual successor to kube-lego, so as an additional assurance, users can expect the same quality and reliability as they did when using kube-lego.

Compatibility with multiple providers

Traefik's built-in certificate management only supports obtain certificates from Let's Encrypt via the ACME protocol. For most people Let's Encrypt certificates may already be sufficient, but this may not be ideal when you are:

  1. Using a private CA in an air-gapped environment
  2. Using managed CAs in cloud providers such as AWS
  3. Using self-signed certificates
  4. Using other ACME providers
  5. Feeling paranoid about using free CAs like Let's Encrypt

Surprising as it may be, the last point is a legit concern for enterprise users where responsibility in the event of data breaches needs to be clearly defined.

i.e. people need to know who to blame other than themselves.

The most interesting design decision here is highlighted in point 4. cert-manager does not integrate specifically with Let's Encrypt but more generally the ACME protocol. This means cert-manager would work with any TLS certificate provider that implements the ACME protocol.

This serves as the foundations to encourage more TLS certificate providers to move towards automated certificate management. This is one point that I find really inspiring, as it aligns with Let's Encrypt's vision of a more secure internet with more sites moving towards short-lived, automatically managed TLS certificates.

Some caveats

Of course, all good things come with caveats. This is an important point to consider if you have been using Traefik's Let's Encrypt DNS-01 validation flow. cert-manager only supports a handful of DNS providers, namely:

  1. Akamai
  2. AzureDNS
  3. Cloudflare
  4. DigitalOcean
  5. Google CloudDNS
  6. Route53
Before you stop reading, there is a workaround if your DNS provider is not listed here.

Keen-eyed readers might have already discerned that the list consists of only major Cloud and CDN providers. If you're wondering if support for your domain registrar's DNS will be added in future, you might be disappointed to hear that the list of DNS providers have not changed in the 2 years or so that I've been monitoring the project.

Personally, I was using Namecheap DNS but thankfully, I found a way to get around the lack of Namecheap DNS support. Cloudflare has a free tier for personal sites through which you can not just use their DNS, but also leverage their performance and security features of their CDN. (Disclaimer: I'm not sponsored by Cloudflare in any form).

To use Cloudflare as my DNS provider, I created a Cloudflare account and set my NS records on my domain registrar's DNS servers to point to Cloudflare DNS. From then, your domain's DNS can now be managed in Cloudflare, which has a superior UI/UX as compared to most domain registrars in my opinion.

Changing your domain nameservers to Cloudflare
Understand how to update nameservers at your domain registrar to successfully activate Cloudflare.OverviewBefore you beginChange your domain nameserversHow to check if your domain nameservers a...
Documentation on how to change your domain's nameservers to Cloudflare

Detailed instructions and a step-by-step tutorial will also be shown during the onboarding flow in Cloudflare should you be interested.

Deployment

Deploying cert-manager is as simple as running a single command:

$ kubectl apply -f https://github.com/jetstack/cert-manager/releases/latest/download/cert-manager.yaml
Command to deploy the latest-released cert-manager onto your cluster

This command installs the latest version of cert-manager along with its associated components (cert-manager, cainjector and webhook) and Custom Resource Definitions into the cert-manager namespace in your cluster. You may verify the yaml manifests by opening the URL in the apply command. Further details can be found in cert-manager's excellent documentation found here.

For multi-tenant users, namespace choice here is important, especially so if your cluster already has a cert-manager installation administered by the cluster owner or some other tenant of the cluster.

Configuration

Here comes the exciting part: How you can integrate cert-manager with Traefik and Let's Encrypt.

Choosing between ClusterIssuer and Issuer

cert-manager provides 2 different Custom Resource Definitions as abstractions for TLS certificate issuers; ClusterIssuer and Issuer. They are virtually identical with only one key difference: Namespace scoping.

  • ClusterIssuer allows the configured certificate issuer to issue certificates to all namespaces in the cluster
  • Issuer only allows the configured issuer to issue certificates to apps within the same namespace it is defined in

If your cluster is single-tenant, meaning you are not concerned about apps from multiple namespaces requesting certificates from the same CA, you should pick the ClusterIssuer.

If your cluster is multi-tenant, whereby the same cluster hosts apps that serve multiple domain names on separate accounts, you should pick the Issuer. You wouldn't want an unrelated app requesting for a certificate for a subdomain that you own, overriding your certificate or worse, impersonating your app to collect sensitive data from your users.

Create Let's Encrypt staging Issuer

To avoid spamming the production Let's Encrypt server with certificate requests during testing and thereby hitting your rate-limits prematurely, you are highly recommended to set up the Let's Encrypt staging issuer first.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    email: <my-letsencrypt-account-email>
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-account-key
    solvers:
    - dns01:
        cloudflare:
          email: <my-cloudflare-account-email>
          apiTokenSecretRef:
            name: cloudflare-api-token
            key: api-token
Yaml manifest for the Let's Encrypt staging issuer

Modify the above manifest, replacing the following values with your own:

  • kind: Your choice between ClusterIssuer or Issuer.
  • spec.acme.email: Your Let's Encrypt account email address. If you have requested certificates from Let's Encrypt before, this is the same email address you used to register a private key with them.
    If you're new to Let's Encrypt, you may fill in any email address that you own.
  • spec.acme.solvers[0].dns01.cloudflare.email: Your Cloudflare email address that you use to log into Cloudflare.

Create Cloudflare API Token Secret

Next, create a Secret to store your Cloudflare API token.

apiVersion: v1
kind: Secret
metadata:
  name: cloudflare-api-token
  namespace: cert-manager
type: Opaque
stringData:
  api-token: <my-cloudflare-api-token>
Yaml manifest for the Secret storing your Cloudflare API Token

The API token can be generated from Cloudflare under My Profile > API Token tab. For the new API token, assign the following Permissions on the respective Zone Resources:

Permissions:

  • Zone - DNS - Edit
  • Zone - Zone - Read

Zone Resources:

  • Include - All Zones

If you are concerned about giving cert-manager access to all DNS Zones (i.e. domain names), you may restrict it to only the DNS Zones that you'd like cert-manager to manage TLS certificates for.

Create test Certificate with staging Issuer

In order not to consume precious rate limit quota from the Let's Encrypt production server, we should test our configuration to make sure that it is working by requesting a Certificate from the staging server.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: <mydomain-com>
  namespace: <app-namespace>
spec:
  secretName: <target-tls-cert-secret-name>
  privateKey:
    rotationPolicy: Always
  dnsNames:
  - <mydomain.com>
  - <app.mydomain.com>
  issuerRef:
    name: letsencrypt-staging
    kind: ClusterIssuer
    group: cert-manager.io
Yaml manifest for the staging TLS certificate 

Modify the above manifest, replacing the following values with your own:

  • metadata.name: Your domain name with hyphens replacing dots.
    You may specify any other name for this Certificate resource, but it should be indicative of the domain(s) it covers and the nature of the issuer.
    For instance, I specified ikarus-sg-staging.
  • metadata.namespace: Namespace of the app(s) using this certificate.
  • spec.secretName: Name of the target secret to store requested certificate in.
    Once again, this should be indicative of the domain(s) covered and the issuer nature.
    For instance, I specified tls-cert-ikarus-sg-staging
  • spec.dnsNames: DNS names to request the certificate for. Usually this is set to a single element, specifying the subdomain that the app should be reachable at.

Explanation of all possible configuration parameters can be found in their official docs here.

Certificate Resources
In cert-manager, the Certificate resource represents a human readable definition of a certificate request that is to be honored by an issuer which is to be kept up-to-date. This is the usual way that you will interact with cert-manager to request signed certificates.In order to issue any certificat…

Use certificate in app Ingress or IngressRoute

Finally, configure the app Ingress or IngressRoute, depending on your traefik configuration provider, to read the certificate from the target Secret.

Personally, I'm using IngressRoute, but fortunately, the configuration key tls is the same across the two resource kinds, so you may reference my configuration here.

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: <app-name>
  namespace: <app-namespace>
spec:
  entryPoints:
  - https
  routes:
  - kind: Rule
    match: Host(`app.mydomain.com`)
    services:
    - name: <app-service-name>
      port: <app-service-port>
  tls:
    secretName: <target-tls-secret-name>
Yaml manifest for an example app IngressRoute. Note the tls key.

Ignore all other elements of this IngressRoute manifest, you should use your existing manifest, only look at the tls key configured here.

Set spec.tls.secretName to the same value defined in the spec.secretName in the Certificate CRD defined above. This instructs traefik to pick up the certificate from the secret specified here. At the risk of sounding like a broken record, the namespace of this Ingress/IngressRoute should be the same as that of the Certificate.

Test staging certificate

The certificate request process should have already completed. If not, check the progress by running the following command, and looking at the status field.

$ kubectl describe certificate <certificate-name> -n <app-namespace>
Command to check on certificate status
...
Status:
  Conditions:
    Last Transition Time:  2021-07-25T09:28:06Z
    Message:               Certificate is up to date and has not expired
    Observed Generation:   2
    Reason:                Ready
    Status:                True
    Type:                  Ready
  Not After:               2021-10-23T08:28:02Z
  Not Before:              2021-07-25T08:28:04Z
  Renewal Time:            2021-09-23T08:28:02Z
  Revision:                3
Expected output for a successful certificate request

If the status shows something like that shown above, your certificate should be ready. If it doesn't double check your configurations and make sure the configuration parameters described above are changed appropriately.

If all's well, open your browser, and navigate to the subdomain that you recently requested the certificate for. Inspect the untrusted certificate by clicking on the lock icon to the left, and verify that the issuer is Let's Encrypt Staging and not Traefik default issuer.

Create Let's Encrypt production equivalents

Repeat the above steps to create the Let's Encrypt production equivalents of ClusterIssuer/Issuer and Certificate, with the following differences:

  1. Replace spec.acme.server in ClusterIssuer/Issuer with:
    https://acme-v02.api.letsencrypt.org/directory
  2. Remove any -staging suffixes in the identifiers to avoid overwriting the staging equivalents.

Switch to newly created Certificate

In your existing app Ingress/IngressRoute, set spec.tls.secretName to the same value defined in the spec.secretName in the production Certificate CRD. Follow the above steps to check on the progress of certificate request and to verify the certificate.

Profit!

If all has gone well so far, congratulations! You now have a working deployment of cert-manager, managing your TLS certificates.

You may now delete the staging Certificate or even its Issuer/ClusterIssuer.

Also don't forget to remove TLS related configuration from Traefik so that it no longer attempts to renew and thus override certificates requested by cert-manager.

What's next

I'm planning to write up 2 posts which are forks of the same thread, which might prove to be interesting for those who are infra-minimalists like myself.

  1. Managing wildcard TLS certificates for multiple namespaces with cert-manager.
  2. Configuring Traefik v2 for high-availability.

Stay tuned for more interesting content in the upcoming weeks!