Deploy Traefik on Kubernetes with Wildcard TLS Certs

How to painlessly deploy and configure Traefik v2 on Kubernetes as the Ingress Controller with automated Let's Encrypt ACME wildcard TLS certs.

Deploy Traefik on Kubernetes with Wildcard TLS Certs

In the first part of this series, I talked about the motivations behind my choice of Traefik as the Ingress Controller for my Kubernetes cluster.

Kubernetes Ingress Controllers: Why I Chose Traefik
First of a 2-part series on my experience with Traefik for the past 3 years. Part 2 will focus on how you can deploy it on your Kubernetes cluster.

In this 2nd part of the series, I'll walk you through how you can get your own Traefik deployment up and running without having to dig through much as a single page of the Traefik documentation.

I've dug through the documentation so you don't have to!

With all the chitter chatter aside, let's jump right in to deploying Traefik on your Kubernetes cluster. This assumes that you already have a Kubernetes cluster set up. If you haven't already done so, you may refer to my previous post on how you can set one up with k3s.

Run Kubernetes on your Raspberry Pi cluster with k3s
Some fun facts about Kubernetes that you probably didn’t know, caveats when running it on Raspberry Pi, and how you can set up your own cluster with k3s.

With Helm

Helm installation is as easy as running 3 commands. If you haven't installed helm yet, install the tool with brew install helm and follow their instructions on how to install it on your cluster.

helm repo add traefik https://helm.traefik.io/traefik
helm repo update
helm install traefik traefik/traefik

I am not a big fan of Helm myself so I have not tried it on my cluster, but I'm putting it here nonetheless for those who favor Helm and for the sake of completeness.

With plain yaml

Personally, I'm a huge fan of explicitness, especially when it comes to my cluster. Hence, I've always opted for plain Kubernetes yaml manifests so that I have full control over each deployment.

Here are my personal deployment files, but simplified in the following ways:

  • single-replica deployment
  • TLS-ALPN-01 challenge instead of DNS-01 challenge (if you don't need wildcard TLS certs)

Using the manifests here would create a traefik-system namespace and deploy Traefik in it. To use them, save each of them in a file, and apply them to your cluster with kubectl apply -f <filename.yaml>.

apiVersion: v1
kind: Namespace
metadata:
  name: traefik-system
traefik-namespace.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: traefik
  namespace: traefik-system
traefik-serviceaccount.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: traefik
  namespace: traefik-system
  labels:
    app: traefik
spec:
  replicas: 1
  selector:
    matchLabels:
      app: traefik
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 50%
  template:
    metadata:
      labels:
        app: traefik
    spec:
      serviceAccountName: traefik
      containers:
      - image: traefik:2.4.5
        imagePullPolicy: IfNotPresent
        name: traefik
        readinessProbe:
          httpGet:
            path: /ping
            port: 80
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 2
        livenessProbe:
          httpGet:
            path: /ping
            port: 80
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 2
        ports:
        - name: http
          containerPort: 80
        - name: https
          containerPort: 443
        resources:
          requests:
            memory: 50Mi
            cpu: 100m
          limits:
            memory: 50Mi
            cpu: 500m
        securityContext:
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - ALL
        volumeMounts:
        - mountPath: /acme
          name: acme
        - mountPath: /etc/traefik/traefik.yml
          name: config-static
          subPath: traefik.yml
      volumes:
      - name: acme
        persistentVolumeClaim:
          claimName: nfs-traefik
      - name: config-static
        configMap:
          name: traefik-static
traefik-deployment.yaml
apiVersion: v1
kind: Service
metadata:
  name: traefik
  namespace: traefik-system
spec:
  externalTrafficPolicy: Local
  selector:
    app: traefik
  ports:
    - port: 80
      protocol: TCP
      name: http
    - port: 443
      protocol: TCP
      name: https
  type: LoadBalancer
  loadBalancerIP: <your-cluster-public-ip>
traefik-service.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: traefik-static
  namespace: traefik-system
  labels:
    app: traefik
data:
  traefik.yml: |
    global:
      checkNewVersion: false
      sendAnonymousUsage: true
    entryPoints:
      http:
        address: :80
        http:
          redirections:
            entryPoint:
              to: https
              scheme: https
              permanent: false
              priority: 1
      https:
        address: :443
        http:
          tls: {}
    providers:
      kubernetesCRD: {}
    api:
      dashboard: true
    ping:
      entryPoint: http
    log:
      level: INFO
    certificatesResolvers:
      default:
        acme:
          email: <[email protected]>
          storage: /acme/acme.json
          tlsChallenge: {}
traefik-configmap.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: nfs-traefik
  namespace: traefik-system
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10M
  storageClassName: ''
  volumeName: nfs-traefik

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-traefik
spec:
  capacity:
    storage: 10M
  claimRef:
    name: nfs-traefik
    namespace: traefik-system
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Delete
  storageClassName: ''
  nfs:
    server: <insert your nfs host>
    path: <insert your nfs path>
traefik-pv-pvc.yaml

When using the above manifests, be sure to change the following according to your infrastructure:

  1. spec.loadBalancerIP field in traefik-service.yaml to your cluster's public IP address. This assumes you already have a load balancer service on your cluster. If you haven't, I'd recommend that you check out MetalLB.
  2. certificateResolves.default.acme.email field in traefik-configmap.yaml to your email address.
    It need not be a functional email address but it would make things easier as Let's Encrypt would notify you by email when your certificate is nearing expiry
  3. nfs.server field in traefik-pv-pvc.yaml to your NFS server hostname or IP address.
  4. nfs.path field in traefik-pv-pvc.yaml to your NFS server path dedicated to storing Traefik's TLS certs.

Some notes about my Deployment specification for Traefik:

  1. Traefik image used here of version 2.4.5
  2. RollingUpdate strategy allows for zero-downtime updates for the event when you need to upgrade Traefik to a newer version
  3. NET_BIND_SERVICE permissions are required for the Traefik Pod to bind to privileged ports such as 80 and 443 on the node

Configuring Traefik for Kubernetes IngressRoute provider

Typically for ingress controllers, one will apply controller-specific configuration via annotations on the Ingress object. I found this to be rather clunky such as in the event where one has to specify complex data structures like maps or lists in string format as an annotation, so I've opted to use Traefik's custom resource definition known as the IngressRoute to specify ingress rules.

I like how IngressRoutes are more expressive and generally much neater than just using annotations, although I'm aware that this essentially locks me in to using Traefik for all my apps, but I figured it's not too difficult to migrate back to plain Ingress objects one day should there be a need to as I'm not running business-critical apps.

Kubernetes CRD - Traefik
Traefik Documentation

Shown below are the CRDs and RBACs for my Traefik deployment. You are advised to check their docs for more up-to-date versions of these definitions.

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ingressroutes.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: IngressRoute
    plural: ingressroutes
    singular: ingressroute
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: middlewares.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: Middleware
    plural: middlewares
    singular: middleware
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ingressroutetcps.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: IngressRouteTCP
    plural: ingressroutetcps
    singular: ingressroutetcp
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ingressrouteudps.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: IngressRouteUDP
    plural: ingressrouteudps
    singular: ingressrouteudp
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: tlsoptions.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: TLSOption
    plural: tlsoptions
    singular: tlsoption
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: tlsstores.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: TLSStore
    plural: tlsstores
    singular: tlsstore
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: traefikservices.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: TraefikService
    plural: traefikservices
    singular: traefikservice
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: serverstransports.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: ServersTransport
    plural: serverstransports
    singular: serverstransport
  scope: Namespaced
traefik-crd.yaml as of v2.4.5 (Source)
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: traefik

rules:
  - apiGroups:
      - ""
    resources:
      - services
      - endpoints
      - secrets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - extensions
      - networking.k8s.io
    resources:
      - ingresses
      - ingressclasses
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - extensions
    resources:
      - ingresses/status
    verbs:
      - update
  - apiGroups:
      - traefik.containo.us
    resources:
      - middlewares
      - ingressroutes
      - traefikservices
      - ingressroutetcps
      - ingressrouteudps
      - tlsoptions
      - tlsstores
      - serverstransports
    verbs:
      - get
      - list
      - watch

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: traefik

roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: traefik
subjects:
  - kind: ServiceAccount
    name: traefik
    namespace: traefik-system
traefik-rbac.yaml as of v2.4.5 (Source)

Deploying an example app

To test app connectivity, TLS certificate validity and load-balancing behaviour, you may use Traefik's official test image whoami.

Apply the following manifests to your cluster:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: whoami
  labels:
    app: whoami
spec:
  replicas: 3
  selector:
    matchLabels:
      app: whoami
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: whoami
    spec:
      containers:
      - image: traefik/whoami
        name: whoami
        ports:
        - name: http
          containerPort: 80
      restartPolicy: Always
whoami-deployment.yaml
apiVersion: v1
kind: Service
metadata:
  name: whoami
  labels:
    app: whoami
spec:
  selector:
    app: whoami
  ports:
  - port: 80
    protocol: TCP
    targetPort: http
    name: http
whoami-service.yaml
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: whoami
spec:
  entryPoints:
  - https
  routes:
  - kind: Rule
    match: Host(`whoami.<your-domain-name>`)
    services:
    - name: whoami
      port: 80
  tls:
    certResolver: default
whoami-ingressroute.yaml
Note: Replace <your-domain-name> in spec.routes[0].match of whoami-ingressroute.yaml with your domain name.

Deploying the manifests above will deploy 3-replicas of the whoami image, and also trigger traefik to request a certificate for whoami.<your-domain-name>.

If you see certificate errors, wait a couple of minutes as Traefik may be still in the process of requesting a certificate from Let's Encrypt, and meanwhile serving with its default self-signed TLS certificates. You may monitor the progress of ACME challenges by locating the Traefik pod name and tailing the logs for it, searching specifically for the acme keyword.

$ kubectl get pods -n traefik-system
NAME                      READY   STATUS    RESTARTS   AGE
traefik-75d5976d8d-224jv  1/1     Running   0          66s

$ kubectl logs -n traefik-system -f traefik-75d5976d8d-224jv
Example to tail logs of traefik pod: traefik-75d5976d8d-224jv

If nothing has gone horribly wrong thus far, loading the page prints the current pod name that your request has reached. Refreshing the page would ideally rotate you through all 3 whoami pods, demonstrating the default round-robin load balancing that Traefik implements.

(Optional) Wildcard TLS Certificates

Although there's some debate in the community about potential security concerns of TLS certificates, in my opinion, there are several non-negligible reasons as to why one should consider using wildcard TLS certificates.

Single certificate for all subdomains

Wildcard TLS certificates would help simplify TLS certificate management for your cluster immensely by allowing all subdomains to share a single TLS certificate.

This means, if you're segregating your application domains by using subdomains (which you should be) you do not need to request a new certificate per application. This essentially gets rid of the delay between your application's first deployment and its final availability via HTTPS.

Subdomain obfuscation

Another added benefit of wildcard TLS certificates is subdomain obfuscation. With the traditional method of 1 TLS certificate per subdomain, the presence of your subdomains is exposed to the public via the Certificate Transparency Logs as I briefly demonstrated above with my domain ikrs.link.

crt.sh | Certificate Search
Free CT Log Certificate Search Tool from Sectigo (formerly Comodo CA)

Using a wildcard certificate hides your subdomains from these public logs and all that a malicious actor can glean is that you host services under the your domain name over TLS, but do not know which subdomains they're on. Though obfuscation is not the same as security itself, not giving away your subdomains upfront is already a plus to me.

Configuring Traefik to request wildcard TLS certificates

To obtain wildcard TLS certificates, one would need to complete the DNS-01 challenge. To do that, you'll need to make 2 changes to Traefik:

Add the configuration keys in place of tlsChallenge: in the static configuration ConfigMap.

dnsChallenge:
  provider: namecheap

Depending on your DNS provider, add the required environment variables to your Traefik deployment. Traefik supports many DNS providers such as Cloudflare or Namecheap to name a few, hence it's very likely that your DNS provider is supported. Check their documentation to find out if your provider is supported.

Let’s Encrypt - Traefik
Traefik Documentation

Deploy a Kubernetes containing your provider's credentials for Traefik to pick up.

apiVersion: v1
kind: Secret
metadata:
  name: traefik-secret
  namespace: traefik-system
stringData:
  NAMECHEAP_API_USER: <your_namecheap_username>
  NAMECHEAP_API_KEY: <your_namecheap_api_key>
Kubernetes secret containing your provider's credentials

Paste the env section underspec.template.spec.containers[0] of traefik-deployment.yaml. In this example, I assume that you're using the Namecheap DNS provider.

env:
- name: NAMECHEAP_API_USER
  valueFrom:
    secretKeyRef:
      name: traefik-secret
      key: NAMECHEAP_API_USER
- name: NAMECHEAP_API_KEY
  valueFrom:
    secretKeyRef:
      name: traefik-secret
      key: NAMECHEAP_API_KEY
env section in the deployment spec

Upon applying traefik-deployment.yaml Traefik would delete and recreate the Traefik pod and if you already have an IngressRoute defined that requires a cert but don't already have a cert, Traefik will start requesting for a cert from Let's Encrypt with the DNS-01 challenge.

And just like that, you have a working TLS-enabled reverse proxy!

What's next

In the next part of this series, I will be talking about cert-manager and how it can help further simplify TLS certificate management. See you in a week or two!