I've been lurking around in r/selfhosted for about a year now, spying on the apps that people self-host, and their pretty dashboards full of self-hosted apps. Over time I observed some dissent in the community. There exists a group of people that vehemently reject the use of Docker.
...what happens if the container is not maintained further? i have no desire to do that. I still prefer to install my dependencies, run the applications via systemd and set up an nginx reverse proxy. i still feel like i can interact with the process. (Reddit)
I'm a person who has never really been too fond of Docker. I like having full control of the applications I'm running, their configuration and their storage. With Docker, I've always felt like I've lost a bit of that control. (Reddit)
Reading those comments made me wonder:
Are there others out there who shares the same perspective as them?
In this piece, I outline the journey through which I found Docker and I aim to convince just one person reading this, that Docker is worth learning and using.
LEMP - Linux, Nginx, MySQL, Phpain
As performance was absolutely abysmal on the Raspberry Pi 1 Model B in my Octopi cluster, I spent a few months tuning
php-fpm for performance and configuring
nginx for security. I found that that process almost never works as expected. Raspberry Pi and its OS, Raspbian, differs vastly from others hosting LEMP stacks out there. Therefore, what worked on guides and tutorials are almost guaranteed to not work on my setup without significant modification.
Sources of headache
- Missing system dependencies
- System dependencies with strange Raspbian defaults
- Outdated system packages in the Raspbian APT repository
- System packages not pre-compiled for the Arm architecture
To make matters worse, even though I commit my configuration files to git, I have had to make live changes on the server nodes to iterate quickly and eventually I lost track of how I broke the site in ways I can only imagine.
That path, almost always ends in a painful full reinstall of the LAMP stack.
Migration and more Phpain
A year after I built my first cluster, I built my second, Kraken, comprised of 7 Raspberry Pi 3 Model Bs and I started migrating my WordPress installation to the new cluster. Armed with my trusty Ansible playbooks, where the installation and configuration tasks were all written in stone as code, I was confident that installation of WordPress would be a breeze.
Unfortunately, in DevOps, Murphy's Law always applies:
"Anything that can go wrong will go wrong"
Turns out, what used to work in Octopi did not work in Kraken and I ended up spending 3 full days debugging the playbooks I wrote before I got just
php-fpm installed and configured correctly for WordPress. Interestingly, deploying PHP seems to be a source of pain for many, though especially so for me.
The problem does not lie in installing PHP itself, but in installing all the PHP extensions that WordPress requires. To date, there is no definitive list of extensions required and/or recommended by and for WordPress. The best list I found was an answer on StackExchange posted 8 years ago with 22 extensions.
To make matters worse, each PHP extension has its own set of system dependencies, requiring certain versions of Linux libraries or packages to be installed.
A slice of my headaches
To illustrate how annoying it is to manage PHP extensions and their associated system dependencies, I'll list some PHP extensions and their system dependencies.
|PHP Extension||System Dependency|
Through all my struggles with PHP, I found this (then) treasure known as Docker and then I thought it was the best invention that mankind has ever came up with.
Docker is a framework that allows containerization of applications.
Containerization is the process of bundling the operating system, application, and its configuration and dependencies into a single object called an image, which can be downloaded and run in any environment.
Docker abstracts the operating system away from app deployments, which means that applications that used to work in Octopi will always work regardless of the operating system, given the same CPU architecture.
From Dockerfiles to Containers and Stacks
Dockerfiles are text files that fully describe your image. They contain the sequence of commands required to install and configure an application starting from a base image. The base image can be a simple base OS image like
debian:buster or any other application image like
php:7.2 allowing you to build off the work of others, running additional commands on top of what was already done.
Dockerfiles can optionally also contain several OS-related elements such as specifying folders to be used as mount points for volumes, as well as ports to be exposed.
Think of Dockerfiles as analogous to class definitions in programming languages.
These Dockerfiles can be built into an image, where the sequence of commands defined in the Dockerfile are run on the base image, thereby materializing those changes and packaging the result into an image. An image is therefore, an instance of the Dockerfile that it was built from.
A Docker image can be seen as analogous to objects instantiated from classes in programming terms.
Docker images can be pushed into repositories hosted on registries, the default being Docker Hub, where it can be shared as is with others who wish to install the same application and configure it in the same way.
When you think of Docker registries, think Github
Docker images can then be pulled into the machine that will run the application, and subsequently the image can be run as a container, and the application will, in theory, work as it has, on the machine the image was built on.
A container here, is comparable to the concept of a single copy of your application code, cloned from Github, and run. You can run multiple copies of your code, just as you can run multiple containers of the same image.
Multiple related containers representing different components of a single app can be added into a single
docker-compose.yml file that fully describes their runtime configuration. A docker-compose file can contain environment variable values, volume mounts and host-container port mappings.
Think of docker-compose files as specifying all the parameters that each function of your application code takes, that uniquely configures the entire application to your usage.
What this meant for my setup
- No longer need to bother myself with installing system dependencies required by PHP extensions
- No need to search for a exhaustive list of PHP extensions required by WordPress
- Configuration that is tried and tested can be packaged together with PHP in an image to be installed elsewhere
- Machines are more secure with PHP running as a non-root user in an isolated container
- Clear separation of static configuration related to the application's operating environment, and runtime configuration such as passwords and hostnames of other components of the same application
What was a mess of installation and configuration steps for
MySQL in my Ansible playbooks, was reduced to defining a
docker-compose.yml file and running a single command.
Deciphering the docker-compose file
- I defined 3 services,
wordpressnamed volume with
web, where serving static assets is left to nginx while php code is handled by php-fpm.
- In both
php, there are bind mounts to config files on the host machine which will override default configuration that comes with the
db, identical values are set for the database credentials environment variables so that the mysql container creates a database with the same credentials on initialization.
WORDPRESS_DB_HOSTis set to the service name of the MySQL container,
db. Docker DNS will automatically resolve target container in the Docker virtual network.
- There is a named volume mount,
dbwhich is meant for persistence of the database should the container be shutdown.
Why should you use Docker?
If the abstractions of containerization outlined earlier hasn't convinced you yet, here's a TL;DR of all the benefits of Docker.
- Run apps in an isolated OS environment, independent of host OS
- Does not incur as much overhead as full virtualization with VMs
- Package configuration with the app as a single image
- Expose only ports that you need and block all others
- Change configuration files on the fly with volumes for quick iteration
- Link related containers and configuration in a single docker-compose file
- Option to deploy highly-available renditions of your apps with simple modifications
Addressing Reddit Comments
The common theme across arguments against the use of Docker is the loss of control. Indeed, during the early stages of using Docker, I did feel that there was some degree of control given up.
I no longer had direct access to configuration files on the host machine and had to jump through several hoops in order to tweak a single configuration variable, files and logs are also no longer a single
cat command away. It was indeed very frustrating for me then, as someone who was used to modifying configuration and code directly on the host machine.
After a month of using Docker, I found out that my deployments and configuration were a lot more organized. This is especially so with Docker Swarm, which I will introduce in later piece, where I no longer needed to SSH into each machine to view logs. Configuration and code can be bind-mounted to host directories or shared storage which I can access as easily as I used to and lastly, environment variables no longer need to be defined on the host machine.
Docker has brought me much needed structure to my view on separation of application code, configuration, and state persistence of application, and I feel the minor inconvenience it causes is well worth the convenience it offers.
The most important reason to use Docker
In my opinion, the most important reason for me is this:
Most if not all popular software out there have a Docker image available for use, and that greatly simplifies the installation process.
If you intend to do any kind of self-hosting, knowing Docker is crucial in saving yourself from sleepless nights spent debugging your work.
Even if you don't intend to self-host, assuming you're in the tech field, be it as a Data Scientist or a Backend Software Engineer, Docker is very relevant to your career and will remain so until software no longer has dependencies (which will never happen).
At work, Docker is used very frequently in my daily workflows, uses include:
- Maintaining an identical data analysis environment across the team
- Developing and testing applications in a predictable environment
- Ensuring zero-downtime atomic deployments
If I were to sum up the entire piece here in a single sentence, I would tell you this:
If you are serious in working in a tech role, you should pick up Docker.