In my previous post, I detailed my struggles with bare-metal setups of applications and how Docker saved me from the pain of installing and configuring PHP. In this post, I shall outline the process through which I (finally) started clustering.
I assume that you already have knowledge on and some experience with Docker and docker-compose. If you don't, I'd suggest referring to my previous post which should give you some context.
Identity crisis and more Phpain
Even though docker-compose was great, there was still one problem.
Docker-compose runs all the services defined on a single machine.
There's no point to the cluster I built if everything ends up running on a single machine is there?
Besides the identity crisis my cluster faced, the Wordpress site deployed with docker-compose was also going down occasionally for no apparent reason. When that happened, I had to manually SSH into the cluster and run
docker-compose up -d again to restart the stack.
For a service to be reliably self-hosted, I needed a way to ensure that my Wordpress site does not go down in the event that php runs out of memory. This forced me to start looking into the concept of high-availability.
High availability (HA) is a characteristic of a system which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.
A quick google search on
docker high-availability yielded my first foray into true cluster management and container orchestration: Docker Swarm
Hello Docker Swarm!
Docker Swarm is a framework embedded within Docker Engine that allows you to create and manage stacks, each stack comprising of multiple services, each service run on arbitrary node(s), in a cluster of swarm nodes.
In Docker Swarm, a swarm consists of a cluster of machines, each can hold the role of either a manager or a worker but not both. Much like a typical organization, managers are responsible for receiving instructions from the administrator and subsequently making decisions on where to run a container, workers receive instructions from managers and run containers.
The most minimalistic cluster may consist of just 1 manager node. This is because manager nodes are configured to not just run manager-specific, system-critical containers but also worker-destined containers by default, hence doubling as a worker node. In practice though, running worker-destined containers on manager nodes is not advisable given that the liveness of the manager node is critical to the sustenance of the swarm.
Differences from docker-compose
In order to enable container orchestration in a distributed fashion, there are a couple of key differences Docker Swarm has from docker-compose.
A container is now known as a task, regardless of whether the container is long-lived or short-lived. Services represent logical components of your application, and may consist of a group of one or more containers running the same image.
Each service as defined in the docker-compose file is no longer constrained to run on a single machine, but rather, scheduled to run on an arbitrary machine in the cluster. One can also specify the number of replicas or copies of tasks to run in each service, allowing traffic to be load-balanced across the replicas and having a failover in the event that one replica goes down.
To maintain an equivalent level of visibility with the new layer of abstraction from managing containers in a cluster, manager nodes now have CRUD access to swarm containers on each node, services within each stack, as well as entire stacks as defined in
Some notable commands I frequently used were:
$ docker stack deploy --compose-file /path/to/docker-compose.yml <stack_name> $ docker stack ls # List all tasks running in a $ docker stack ps <stack_name> $ docker stack rm <stack_name> $ docker stack services <stack_name> $ docker node ls <node_name>
In a single-machine docker-compose setup, an implicit virtual network is created between each service, allowing you to reach another service simply by the service name (e.g.
web) defined in the
docker-compose.yml file. The same happens in Docker Swarm, except in a cluster-wide scope.
Docker Swarm has an internal DNS component that handles DNS lookups of the service names and returns the IP address of the node(s) running the service. Docker Swarm also handles load balancing and external access, where requests destined for a service, landing on any node in the cluster, will be directed to the corresponding node(s) in a round-robin fashion, even if the request first landed on a totally irrelevant node.
Logs are no longer only accessible on the machine where the containers are run. Though the logs are still visible via
docker logs <container_name>, logs may now be accessed on any manager node with
docker service logs <service_name>.
Another point of note is that logs are no longer on the container level but on the service level. Logs for multiple replicas of the same service are aggregated and streamed to the master node. Log entries from each replica can be differentiated by their container id which are generated at runtime.
How this changed the game for me
High-availability is achieved with just an addition of a few lines in the docker-compose file I previously prepared, specifically the
deploy key. With multiple replicas of each component of Wordpress, they become not only fault-tolerant, but also more capable of handling heavy workloads as traffic is load-balanced between all replicas.
Relevance in today's world
Yes, indeed, Docker Swarm has already lost the battle against Kubernetes for supremacy in the container orchestration space, marked by Docker acquisition by Mirantis, but those that have explored Kubernetes from the get go would agree with me that Kubernetes remains quite a leap from running local containers.
In my opinion, Docker Swarm provides the much needed middle-ground and makes Kubernetes much more palatable. When compared with Kubernetes, networking in Docker Swarm is a complete black-box. Although that means less-configurability, that also makes Docker Swarm an ideal entry point for people who are just getting into cluster management and are not interested in the nitty gritty details like Ingress and Service configuration.
Coming from a background in Docker Swarm has also helped me appreciate the design of Kubernetes much better than if I had simply went straight for it. While Docker Swarm is already great framework on its own, it falls short when it comes to aspects such as access control, multi-tenancy, and options for distributed storage, all of which you will only realize the importance of when you've experienced the necessity first-hand.
Since the deployment of my first Docker Swarm application, I have moved on to deploy many apps with high-availability and have learnt much about cluster management and container orchestration which prepared me for my next adventure into Kubernetes.
If you're just starting on building apps or deploying open-source projects on a small-scale, I'd highly recommend picking up Docker Swarm to get yourself familiarized with the concepts of clustering, especially if Kubernetes feels extremely intimidating to you at this moment.
After reading all that, if you're set on diving into Docker Swarm, I have an Ansible role
docker-swarm that may help you in installing Docker, and provisioning the swarm for your cluster, at my Github repository.
I'll write about how I got into Kubernetes and put up a brief instructional on how to get your own Kubernetes cluster set up on a bunch of Raspberry Pis, so stay with me!