Docker

How I Started Using Docker and Why You Should Too

I outline my voyage and pains through which I found Docker and I aim to convince just one person reading this, that Docker is worth learning and using.

Will Ho

Jul 13, 2020 • 8 min read

I've been lurking around in r/selfhosted for about a year now, spying on the apps that people self-host, and their pretty dashboards full of self-hosted apps. Over time I observed some dissent in the community. There exists a group of people that vehemently reject the use of Docker.

...what happens if the container is not maintained further? i have no desire to do that. I still prefer to install my dependencies, run the applications via systemd and set up an nginx reverse proxy. i still feel like i can interact with the process. (Reddit)

I'm a person who has never really been too fond of Docker. I like having full control of the applications I'm running, their configuration and their storage. With Docker, I've always felt like I've lost a bit of that control. (Reddit)

Reading those comments made me wonder:

Are there others out there who shares the same perspective as them?

In this piece, I outline the journey through which I found Docker and I aim to convince just one person reading this, that Docker is worth learning and using.

LEMP - Linux, Nginx, MySQL, Phpain

As performance was absolutely abysmal on the Raspberry Pi 1 Model B in my Octopi cluster, I spent a few months tuning php-fpm for performance and configuring nginx for security. I found that that process almost never works as expected. Raspberry Pi and its OS, Raspbian, differs vastly from others hosting LEMP stacks out there. Therefore, what worked on guides and tutorials are almost guaranteed to not work on my setup without significant modification.

Sources of headache

Missing system dependencies
System dependencies with strange Raspbian defaults
Outdated system packages in the Raspbian APT repository
System packages not pre-compiled for the Arm architecture

To make matters worse, even though I commit my configuration files to git, I have had to make live changes on the server nodes to iterate quickly and eventually I lost track of how I broke the site in ways I can only imagine.

That path, almost always ends in a painful full reinstall of the LAMP stack.

Migration and more Phpain

A year after I built my first cluster, I built my second, Kraken, comprised of 7 Raspberry Pi 3 Model Bs and I started migrating my WordPress installation to the new cluster. Armed with my trusty Ansible playbooks, where the installation and configuration tasks were all written in stone as code, I was confident that installation of WordPress would be a breeze.

Unfortunately, in DevOps, Murphy's Law always applies:
"Anything that can go wrong will go wrong"

Turns out, what used to work in Octopi did not work in Kraken and I ended up spending 3 full days debugging the playbooks I wrote before I got just php-fpm installed and configured correctly for WordPress. Interestingly, deploying PHP seems to be a source of pain for many, though especially so for me.

The problem does not lie in installing PHP itself, but in installing all the PHP extensions that WordPress requires. To date, there is no definitive list of extensions required and/or recommended by and for WordPress. The best list I found was an answer on StackExchange posted 8 years ago with 22 extensions.

What are PHP extensions and libraries WP needs and/or uses?

Codex loosely only mentions PHP version in server requirements, however PHP can be configured in pretty wide range of excluding/including different parts. What are extensions/libraries that: are

WordPress Development Stack ExchangeRarst

To make matters worse, each PHP extension has its own set of system dependencies, requiring certain versions of Linux libraries or packages to be installed.

A slice of my headaches

To illustrate how annoying it is to manage PHP extensions and their associated system dependencies, I'll list some PHP extensions and their system dependencies.

PHP Extension	System Dependency
GD	libgd libjpeg-turbo libpng freetype
Sodium	libsodium
cURL	libcurl
DOM	libxml
iconv	libiconv
zlib	zlib

Through all my struggles with PHP, I found this (then) treasure known as Docker and then I thought it was the best invention that mankind has ever came up with.

Introducing Docker

Docker is a framework that allows containerization of applications.

Containerization is the process of bundling the operating system, application, and its configuration and dependencies into a single object called an image, which can be downloaded and run in any environment.

Docker abstracts the operating system away from app deployments, which means that applications that used to work in Octopi will always work regardless of the operating system, given the same CPU architecture.

From Dockerfiles to Containers and Stacks

Dockerfiles are text files that fully describe your image. They contain the sequence of commands required to install and configure an application starting from a base image. The base image can be a simple base OS image like debian:buster or any other application image like php:7.2 allowing you to build off the work of others, running additional commands on top of what was already done.

Dockerfiles can optionally also contain several OS-related elements such as specifying folders to be used as mount points for volumes, as well as ports to be exposed.

Think of Dockerfiles as analogous to class definitions in programming languages.

These Dockerfiles can be built into an image, where the sequence of commands defined in the Dockerfile are run on the base image, thereby materializing those changes and packaging the result into an image. An image is therefore, an instance of the Dockerfile that it was built from.

A Docker image can be seen as analogous to objects instantiated from classes in programming terms.

Docker images can be pushed into repositories hosted on registries, the default being Docker Hub, where it can be shared as is with others who wish to install the same application and configure it in the same way.

When you think of Docker registries, think Github

Docker images can then be pulled into the machine that will run the application, and subsequently the image can be run as a container, and the application will, in theory, work as it has, on the machine the image was built on.

A container here, is comparable to the concept of a single copy of your application code, cloned from Github, and run. You can run multiple copies of your code, just as you can run multiple containers of the same image.

Multiple related containers representing different components of a single app can be added into a single docker-compose.yml file that fully describes their runtime configuration. A docker-compose file can contain environment variable values, volume mounts and host-container port mappings.

Think of docker-compose files as specifying all the parameters that each function of your application code takes, that uniquely configures the entire application to your usage.

What this meant for my setup

No longer need to bother myself with installing system dependencies required by PHP extensions
No need to search for a exhaustive list of PHP extensions required by WordPress
Configuration that is tried and tested can be packaged together with PHP in an image to be installed elsewhere
Machines are more secure with PHP running as a non-root user in an isolated container
Clear separation of static configuration related to the application's operating environment, and runtime configuration such as passwords and hostnames of other components of the same application

What was a mess of installation and configuration steps for php-fpm, nginx and MySQL in my Ansible playbooks, was reduced to defining a docker-compose.yml file and running a single command.

version: '3.1'

services:
  web:
    image: nginx
    ports:
    - 8000:80
    volumes:
    - wordpress:/var/www/html
    - /home/pi/wordpress/nginx.conf:/etc/nginx/nginx.conf

  php:
    image: wordpress:5.4.2-php7.2-fpm-alpine
    restart: always
    environment:
      WORDPRESS_DB_HOST: db
      WORDPRESS_DB_USER: wordpress
      WORDPRESS_DB_PASSWORD: myverysecurepassword
      WORDPRESS_DB_NAME: wordpress
    volumes:
    - wordpress:/var/www/html
    - /home/pi/wordpress/php-fpm.conf:/usr/local/etc/php-fpm.d/zzz-kraken.conf

  db:
    image: mysql:5.7
    restart: always
    environment:
      MYSQL_DATABASE: wordpress
      MYSQL_USER: wordpress
      MYSQL_PASSWORD: myverysecurepassword
      MYSQL_RANDOM_ROOT_PASSWORD: '1'
    volumes:
    - db:/var/lib/mysql

volumes:
  wordpress:
  db:

The docker-compose.yml file I used for deploying Wordpress, with custom php and nginx configs

$ docker-compose up -d

Deploying my WordPress stack in 1 command

Deciphering the docker-compose file

I defined 3 services, web, php, and db.
php shares the wordpress named volume with web, where serving static assets is left to nginx while php code is handled by php-fpm.
In both web and php, there are bind mounts to config files on the host machine which will override default configuration that comes with the nginx and wordpress images.
In php and db, identical values are set for the database credentials environment variables so that the mysql container creates a database with the same credentials on initialization.
In php, the WORDPRESS_DB_HOST is set to the service name of the MySQL container, db. Docker DNS will automatically resolve target container in the Docker virtual network.
There is a named volume mount, db which is meant for persistence of the database should the container be shutdown.

Why should you use Docker?

If the abstractions of containerization outlined earlier hasn't convinced you yet, here's a TL;DR of all the benefits of Docker.

Run apps in an isolated OS environment, independent of host OS
Does not incur as much overhead as full virtualization with VMs
Package configuration with the app as a single image
Expose only ports that you need and block all others
Change configuration files on the fly with volumes for quick iteration
Link related containers and configuration in a single docker-compose file
Option to deploy highly-available renditions of your apps with simple modifications

Addressing Reddit Comments

The common theme across arguments against the use of Docker is the loss of control. Indeed, during the early stages of using Docker, I did feel that there was some degree of control given up.

I no longer had direct access to configuration files on the host machine and had to jump through several hoops in order to tweak a single configuration variable, files and logs are also no longer a single tail or cat command away. It was indeed very frustrating for me then, as someone who was used to modifying configuration and code directly on the host machine.

After a month of using Docker, I found out that my deployments and configuration were a lot more organized. This is especially so with Docker Swarm, which I will introduce in later piece, where I no longer needed to SSH into each machine to view logs. Configuration and code can be bind-mounted to host directories or shared storage which I can access as easily as I used to and lastly, environment variables no longer need to be defined on the host machine.

Docker has brought me much needed structure to my view on separation of application code, configuration, and state persistence of application, and I feel the minor inconvenience it causes is well worth the convenience it offers.

The most important reason to use Docker

In my opinion, the most important reason for me is this:

Most if not all popular software out there have a Docker image available for use, and that greatly simplifies the installation process.

If you intend to do any kind of self-hosting, knowing Docker is crucial in saving yourself from sleepless nights spent debugging your work.

Even if you don't intend to self-host, assuming you're in the tech field, be it as a Data Scientist or a Backend Software Engineer, Docker is very relevant to your career and will remain so until software no longer has dependencies (which will never happen).

At work, Docker is used very frequently in my daily workflows, uses include:

Maintaining an identical data analysis environment across the team
Developing and testing applications in a predictable environment
Ensuring zero-downtime atomic deployments

TL;DR

If I were to sum up the entire piece here in a single sentence, I would tell you this:

If you are serious in working in a tech role, you should pick up Docker.