Kubernetes Ingress Controllers: Why I Chose Traefik

First of a 2-part series on my experience with Traefik for the past 3 years. Part 2 will focus on how you can deploy it on your Kubernetes cluster.

Kubernetes Ingress Controllers: Why I Chose Traefik

Back from a long hiatus from blogging, this time I'll be sharing about the ingress controller that I've been using on my Kubernetes cluster since its inception in 2018.

This piece will be the first in a series covering the various infrastructural components of Leviathan, my Arm64 Kubernetes cluster at home.

From this piece on, moving forward, all content will be written in the context of Kubernetes. If you're looking for how you can set up your own Kubernetes cluster, do check my previous piece on k3s.

Run Kubernetes on your Raspberry Pi cluster with k3s
Some fun facts about Kubernetes that you probably didn’t know, caveats when running it on Raspberry Pi, and how you can set up your own cluster with k3s.

What is an Ingress Controller

To answer this question one must first understand what is an Ingress object in Kubernetes.

From the Kubernetes official documentation, an Ingress is:

An API object that manages external access to the services in a cluster, typically HTTP.

Ingress may provide load balancing, SSL termination and name-based virtual hosting.

In English, an Ingress can be seen as analogous to a reverse-proxy and a load-balancer, except for the fact that Kubernetes adopts a BYOS (Bring-Your-Own-Software) approach and does not provide the software that backs these features. Kubernetes only provides the API interface as a standardized way of defining rules that dictate what traffic goes to which service.

Here's where an Ingress Controller comes into play. An Ingress Controller is an in-cluster application that you deploy, that:

  1. Plugs into the Kubernetes API
  2. Watches Ingress objects
  3. Reads the ingress rules within
  4. Configures itself to route traffic it receives according to those rules

Needless to say, the Ingress Controller service itself must be configured to receive all traffic for your entire cluster. In the context of HTTP/HTTPS traffic, this means listening on ports 80 and 443 on the public IP address that your cluster will receive traffic from.

What is Traefik

Traefik is, as I have already alluded to, an implementation of an Ingress Controller for Kubernetes. It was originally designed as an extensible, lightweight reverse proxy but has since gained the capability to fully integrate itself with a Kubernetes cluster while retaining compatibility with Docker and other interfaces. I will only explore, however, the process of setting up Traefik on Kubernetes here.

traefik/traefik
The Cloud Native Application Proxy. Contribute to traefik/traefik development by creating an account on GitHub.

How Traefik plays a crucial role in self-hosted setups

Usually when it comes to websites, each domain or even subdomain may have its own IP address in an attempt to provide some form of service isolation so that traffic to each service does not impact one another.

In self-hosted setups, you usually have 1 public IP address assigned by your ISP, held by your router, which provides Network Address Translation (NAT) to allow all devices and services in your home network to share that single IP address. As public IP addresses are an expensive commodity and increasingly so as the world progresses towards the depletion of Public IPv4 addresses, you are unlikely to be able to acquire more than 1 public IP address.

Even if you could somehow acquire more than 1 public IP address, you wouldn't have the equipment to control multiple IP addresses as typical consumer routers only have 1 interface, and you'd typically have only 1 connection per home.

What this means is that a reverse proxy such as Traefik is an absolute necessity, which would allow all your services to share a single IP address, allowing you to make the most out of what limited network resources may be available for consumer usage.

Alternatives to Traefik

In the process of searching for the best open-source reverse proxy framework for my clusters, I've come across multiple other alternatives, with the most notable being ingress-nginx and caddy.  

While they are all great software in their own right, I found Traefik to be the one that fits my use case back then, and yet still had room to be extended with additional features for when my cluster grew. Before we dive into the considerations behind Traefik, we'll first explore the alternatives. For a fuller, comprehensive review of ingress controllers, I'd recommend reading the article from Flant.

ingress-nginx

kubernetes/ingress-nginx
NGINX Ingress Controller for Kubernetes. Contribute to kubernetes/ingress-nginx development by creating an account on GitHub.

Although they may share a very similar name, ingress-nginx must not be confused with nginx-ingress-controller which is the enterprise offering from the NGINX company. ingress-nginx is the community developed version with the first stable release (0.9.0) in late 2017 that is now supported officially by Kubernetes considering that their Github repository is under the kubernetes organization.

When compared with other ingress controllers, ingress-nginx can be considered to be the simplest and most straightforward option for someone that needs to get an ingress controller up and ready in the shortest time possible, without the need for much customization. If you're looking for simplicity, definitely check that out first. However, since you're reading this, you're probably looking for something more flexible that will fit your infinitely convoluted setup.

Caddy

caddyserver/caddy
Fast, multi-platform web server with automatic HTTPS - caddyserver/caddy

Caddy is first developed by ZeroSSL, a Stack Holdings company, open sourced in April 2015 with the first version being v0.5.0. Personally I do not have any experience with running it, so I cannot give any constructive comments about it, but based on first impressions, I found Caddy documentation to be a little difficult to parse, with a ton of text but very little examples.

An example of how Caddy documentation looks (Source)

Another downside is that Caddy itself actually does not support configuration discovery via Kubernetes out-of-the-box, meaning it does not read Kubernetes Ingress objects for routing rules, but rather, it reads configuration from a Caddyfile, a file defining all the routes for all your applications in a format not dissimilar to nginx server configurations.

This means if you'd like to deploy a new app or take down an existing app, you'll have to modify the Caddyfile and force caddy to somehow reload the configuration, likely by restarting it. Not ideal in an environment as dynamic as that of Kubernetes.

What is great about Traefik

Traefik is a reverse proxy solution developed by traefiklabs (previously known as Containous) that had its first stable release in 2016, first open sourced in Sep 2015, and currently holds the most number of Github stars among reverse proxy frameworks at 34.1k stars as of the time of writing. Despite the long history, it is still being actively developed with the last commit being just 16 hours ago.

These impressive numbers underpin how well-received this framework is by the community, and affords some assurance that it will remain in active development for a long time to come, which is an important point of consideration not to be understated when it comes to choosing open source frameworks to use.

What I really liked about Traefik can be summarized in the following points:

  1. Very extensible with middlewares
  2. Has a dashboard with a pretty UI
  3. Handles TLS certificate auto-renewal painlessly
  4. Documentation is riddled with examples for each provider type, for each feature

Extensibility

Traefik supports a huge range of middlewares. They have a large collection of built-in middlewares that you can just configure and use right off the bat.

A full list of those middlewares can be found here but here are some notable ones that I currently use in my cluster.

  1. BasicAuth for providing basic authentication on insecure, local endpoints such as the Traefik dashboard itself
  2. ForwardAuth to provide a Single-Sign-On frontend for apps in my cluster that do not support authentication with OpenLDAP
  3. RateLimit to provide basic protection from DDoS attacks for all endpoints

The middlewares are also easy to use and can be configured as Kubernetes Custom Resource definitions which is satisfying for a neat freak like me who loves infrastructure-as-code. For example, here's how I configured the BasicAuth middleware:

apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
  name: admin-auth
  namespace: traefik-system
spec:
  basicAuth:
    secret: traefik-admin-auth-secret

As you can probably glean from the Middleware definition here, it integrates with Kubernetes secrets and obtains the Basic Auth secret from the Kubernetes secret named traefik-admin-auth-secret, which means there's no need to hard-code any password in any file, allowing the middleware to be configured/re-configured on the fly as I create, modify, or delete it.

Pretty dashboard

Traefik has a pretty fancy dashboard built-in that you can use to check on the health of your applications and middlewares. Personally, as much as I love the certainty that command line interfaces afford, I have a soft spot for sleek user interfaces so this really got me sold.

Home page of my Traefik dashboard
Detail view of the HTTP ingress route for this blog

In the detail view you also get to see the ingress rule, Pod name, TLS configuration as well as any middlewares that it is using, offering you great transparency on all the ingress routes currently configured across your entire cluster.

Painless TLS Certificate auto-renewal

Ever since setting up Traefik, I have completely forgotten about the existence of my TLS certificates, and this goes to show how successful Traefik is in managing my Let's Encrypt TLS certs that require renewal every 90 days.

In my setup, I use wildcard TLS certificates provisioned via the DNS-01 ACME (Automatic Certificate Management Environment) challenge, allowing all my ingresses to be reachable by https automatically on-demand. Traefik automatically renews certificates a few days before the expiry of each TLS cert it manages, allowing you to completely forget the process of TLS certificate renewal.

Goodbye certbot and crontab!

Configuring Traefik to obtain TLS certificates via Let's Encrypt ACME TLS-ALPN-01 challenge is as simple as specifying the following in the static configuration file:

certificatesResolvers:
  default:
    acme:
      email: [email protected]
      tlsChallenge: {}
Traefik TLS-ALPN-01 challenge static configuration

DNS-01 challenge configuration is slightly more involved but not much, a process which I will touch in the next piece.

Comprehensive configuration examples

Something that I really appreciate about in Traefik is the fact that even though they support a large range of routing rule configuration providers such as Docker, Kubernetes, Consul, they have never been lax with their examples.

For each feature that they have, they provide examples across all configuration providers they support.

Traefik documentation examples for every configuration discovery provider (Source)

Take for example, in the middleware documentation for BasicAuth shown above, they provide examples for Docker, Kubernetes, Consul Catalog, Marathon, Rancher, File (YAML) and File (TOML). With this, there is no excuse for not knowing how to configure Traefik for your application routing.

What is not so great about Traefik

Documentation structure

While I did mention that Traefik's documentation offers configuration examples for each provider type and feature, I find its documentation structure to be rather counter-intuitive at times.

To properly configure and use Traefik, one needs to specify configurations in 3 different aspects.

  1. Static configuration
  2. Dynamic configuration
  3. Route rules configuration

Traefik's documentation is structured such that each feature is documented separately where such abstractions may not be meaningful in some provider contexts. As a result, when you are looking to configure Traefik for a certain use case, you'll have to refer to multiple different sections of the documentation and aggregate the knowledge from those to be able to form a coherent view of how to configure Traefik.

Not very intuitive Traefik documentation nesting (Source)

For example, if you'd like to run Traefik as an Ingress Controller, to configure Ingress objects, you'll have to refer to 4 different sections: Entrypoints, Routers, Services, and Providers > Kubernetes IngressRoute. But Routers and Services are not visible abstractions in Kubernetes, and are only applicable for the File provider where ingress rules are defined in a file.

Lack of high-availability TLS-enabled setup

Update: I've since found a way to configure and deploy Traefik for high-availability so this point perhaps no longer applies.

While Traefik itself can be highly available as it is a stateless application, it does not have a way to sync Let's Encrypt ACME state across multiple instances. Previously in Traefik v1.7, they used to support syncing ACME state via Consul Key-value store.

Unfortunately, but this feature has since been deprecated and removed from Traefik V2 citing poor reliability/performance, and now parked under Traefik Enterprise as a high-availability feature, and it does not seem like there are any plans in the horizon to implement this in the community edition.

Though this may seem like a huge caveat, it's not that critical for self-hosted setups as in home scenarios, the cluster usually does not receive that much traffic that it warrants a multi-instance load-balanced Traefik deployment. In the event that a node hosting Traefik experiences an outage, Traefik will be redeployed by the Kubernetes scheduler to another node, and cluster connectivity would be restored in a couple of minutes.

Nonetheless, I've had good success with running a 3-replica Traefik deployment on my cluster for about a year now, and have not faced any issues thus far, apart from certificate duplication which is not a huge issue in my opinion.

Certificate transparency logs for my domain, ikrs.link (Source)

As you can see in the certificate transparency logs for my link-shortener domain at ikrs.link, there are triplicate certificates, each representing an instance of Traefik requesting for a certificate from Let's Encrypt a few minutes apart and overwriting each other. In practice, only the latest certificate will be used by all instances of Traefik as each instance reloads their own certificate cache and reads the latest certificate.

Wrapping it all up

All in all, Traefik may not be perfect, but it fits my use case of self-hosting cloud-alternative apps for my family's use such as Nextcloud, Vaultwarden and even hosting this blog, perfectly.

If look past its not-so-intuitive documentation, you have in your hands a great reverse proxy setup that can grow with your cluster as you add more apps and require more functionality out of your reverse proxy.

Should you decide to go down this path, hang around for the next piece in which I will decipher the documentation on your behalf and run through step-by-step on how you can deploy Traefik on your own cluster.