Frictions and Complexities of "Simple" Scripts

Bash scripts (or any type of script) can become unwieldy, fragile, and difficult to maintain

Linux · Bash · Automation · Infrastructure as Code · Homelab

Recently, I have been working on building my home lab. Naturally, I want this to be automated and deterministic. While writing a Bash script to configure and deploy a server running Debian and Paperless-ngx I was reminded of how even “simple” scripts can rapidly become unwieldy, fragile, and challenging to maintain.

Infrastructure-as-code tools solve these problems. Ansible, Chef, Salt, Nix/NixOS, Terraform, Bicep, and the like. To varying degrees, these tools employ declarative patterns to achieve idempotency, determinism, and consistency. This is in contrast to an imperative Bash script (or any script). Where declarative tools allow for expressing the desired end state, an imperative script is a series of steps to arrive at the end state.

Additionally, and again to varying degrees, many declarative infrastructure-as-code tools support diffing between the desired and current state. This allows you to see what it intends to change. The average imperative script does not support this, and if it did, it could easily double or triple the line count.

I thought it would be interesting to annotate the script and describe its steps, exposing its complexities, problematic assumptions, and workarounds. Hopefully, this will help explain why “simple” scripts are, more often than not, far from simple - and hopefully make a strong case for using other tools when needed.

Good intentions

Perhaps optimistically, I envisioned the script as straightforward and only a few lines long. However, as I write this, it’s now over a hundred and thirty lines long. I intend to convert this to Ansible because, honestly, writing long Bash scripts is tedious and not how I want to manage my machines.

While I’m on the topic of being honest, I don’t particularly enjoy the Bash syntax or language, and I always feel a slight wave of despair wash over me when I open a load-bearing Bash script hundreds of lines long.

I’m also not proficient in Bash because when I try to use it in anger, I always need to deal with the same frustrating problems - problems that don’t exist or have good solutions in other languages and tools. But once again, I fell into the trap of thinking it would be fine for this task.

I considered using one of the various open-source Bash script templates, such as this one. However, this has over seven hundred lines.

Furthermore, the point of this post is that there are more problems here than simply the language choice for a setup script like this. It’s all the edge cases, footguns, error handling, and the imperative nature of the script.

One of my favourite blogs, rachelbythebay, has some great commentary and observations about exactly this. In her post “Your simple script is someone else’s bad day” she says:

If all of those steps actually succeed, then sure, okay, you win, and it’s probably an improvement over the old manual processes.

Without those checks, what happens if the subsequent steps run, and actually manage to get in some weird state because they ran when they shouldn’t have? It might even make it unable to run again later without manual intervention, since now it won’t be starting from a fresh slate.

Assurances and tooling woes

I also would like to be able to test (both unit and integration) more complex scripts, especially as I build more services and servers for my homelab. At that point, mucking around with scripts will become infeasible.

The closest solution for Bash is bash_unit where the getting started instructions are… clone the repo and modify as needed. Ouch! The goal is to reduce line count, not increase it, so that’s not a viable option.

Both Ansible and Terraform have built-in support for tests.

Requirements

The script needed to perform several tasks:

  • Update package lists and upgrade packages via apt
  • Install needed dependencies: Git, cURL
  • Add package signing keys and repositories for Docker and Tailscale VPN and then install them
  • Authenticate with Tailscale VPN
  • Configure SSH: Add my public key, disable password-based authentication and then restart sshd
    • Create /home/$USER/.ssh/authorized_keys
    • Assign appropriate permissions with chmod
  • Install and configure UFW firewall
  • Create several directories for Paperless-ngx and its users
  • Copy configuration files and Docker files to one of these directories
  • Start Paperless-ngx via docker compose up

Parsing arguments

The first thing I wanted to write was argument handling. Currently, there are two arguments required: a public SSH key and a Tailscale pre-authentication key. Additionally, I’d like to be able to handle missing arguments. It also needs to be run with root (sudo), so it asserts that too.

#!/bin/bash
# Exit on error
set -eu
# Configuration
PUBLIC_KEY=""
TAILSCALE_KEY=""
USER=$(who -m | awk '{print $1}')
cd "/home/$USER"
# Function to show usage
usage() {
echo "Usage: $0 -k '<public-key-string>' -t '<tailscale-auth-key>'"
exit 1
}
# Parse command line options
while getopts ":k:t:" opt; do
case ${opt} in
k )
PUBLIC_KEY=$OPTARG
;;
t )
TAILSCALE_KEY=$OPTARG
;;
\? )
echo "Invalid Option: -$OPTARG" 1>&2
usage
;;
: )
echo "Option -$OPTARG requires an argument." 1>&2
usage
;;
esac
done
shift $((OPTIND -1))
# Validate required option for Public Key
if [ -z "${PUBLIC_KEY}" ]; then
echo "Missing required argument: -k '<public_key_string>'"
usage
fi
# Validate required option for Tailscale Key
if [ -z "${TAILSCALE_KEY}" ]; then
echo "Missing required argument: -t '<tailscale_auth_key>'"
usage
fi
# Ensure running as root
if [ "$(id -u)" -ne 0 ]; then
echo "This script must be run as root."
exit 1
fi

Fifty-seven lines of code, and we’re not even at the stage of installing any packages. This is simply the paperwork1 to have the script in a usable state. A particularly obnoxious section of code is block three.

This features a loop, primitive parsing (and not even good parsing, it only supports single characters), a case block, variable assignment, and side effects (writing to STDOUT and STDERR). Positively a nightmare for anyone concerned about functional programming or readable code.

To top it off, it needs a bunch of noisy syntax thrown around like magic runes to make it work.

Bash, being a text-orientated language and shell, deals primarily with characters and strings.2

Adding package repositories

The next step is installing packages, which involves adding package repositories for Tailscale and Docker. Ideally, I’d be using Podman for the Paperless-ngx container, but Paperless-ngx does not work in a rootless container3.

This is a string manipulation-heavy section, involving echo and tee. This is where the script starts to become more tedious to read and write.

58 collapsed lines
#!/bin/bash
# Exit on error
set -eu
# Configuration
PUBLIC_KEY=""
TAILSCALE_KEY=""
USER=$(who -m | awk '{print $1}')
cd "/home/$USER"
# Function to show usage
usage() {
echo "Usage: $0 -k '<public-key-string>' -t '<tailscale-auth-key>'"
exit 1
}
# Parse command line options
while getopts ":k:t:" opt; do
case ${opt} in
k )
PUBLIC_KEY=$OPTARG
;;
t )
TAILSCALE_KEY=$OPTARG
;;
\? )
echo "Invalid Option: -$OPTARG" 1>&2
usage
;;
: )
echo "Option -$OPTARG requires an argument." 1>&2
usage
;;
esac
done
shift $((OPTIND -1))
# Validate required option for Public Key
if [ -z "${PUBLIC_KEY}" ]; then
echo "Missing required argument: -k '<public_key_string>'"
usage
fi
# Validate required option for Tailscale Key
if [ -z "${TAILSCALE_KEY}" ]; then
echo "Missing required argument: -t '<tailscale_auth_key>'"
usage
fi
# Ensure running as root
if [ "$(id -u)" -ne 0 ]; then
echo "This script must be run as root."
exit 1
fi
# Update packages
echo "Updating and upgrading packages..."
apt-get update -y && apt-get upgrade -y
# Install git and curl
apt-get install -y git curl podman
# Add Tailscale package
echo "Installing Tailscale..."
curl -fsSL "https://pkgs.tailscale.com/stable/debian/$(lsb_release -cs).noarmor.gpg" | sudo tee /usr/share/keyrings/tailscale-archive-keyring.gpg >/dev/null
curl -fsSL "https://pkgs.tailscale.com/stable/debian/$(lsb_release -cs).tailscale-keyring.list" | tee /etc/apt/sources.list.d/tailscale.list
# Add Docker package
apt-get install -y ca-certificates
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
tee /etc/apt/sources.list.d/docker.list > /dev/null
apt-get update
apt-get install -y tailscale docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
echo "Authenticating with Tailscale, please wait..."
tailscale up --authkey "$TAILSCALE_KEY"

Setting up SSH and the firewall

The purpose of this section doesn’t need much of an explanation, but I do want to focus on the way that the script has to update the configuration. Here, I’m using sed to replace “no” strings with “yes” strings. This works, but once again, reminds me that this is all working at the text level.

There’s a certain persistent feeling of fragility when configuring systems with plain text when those systems all use disparate and made-up formats (essentially schemaless), that require regular expressions and string replacement to update. Another example of where infrastructure-as-code is a clear winner.

I can’t pin this one on Bash, though. This is down to OpenSSH and its maintainers thinking that yet another delimited text format is acceptable when several far safer formats exist that benefit from a defined grammar and syntax, of which there are available parsing libraries.

However, this leads to the theme of this article, none of this is particularly elegant or foolproof. What if, for example, this regular expression was used on a file that has one of these “no” or “yes” strings written in a documentation comment?

Well, sed doesn’t replace all occurrences without an explicit flag. In this scenario, running the script twice falls into the idempotency trap again, potentially replacing the wrong “yes” or “no” strings. To accurately modify only the intended string, the script must distinguish the correct line from irrelevant comments.

However, without a structured file format, identifying the target line involves checking line numbers or differentiating between comments and actual code. This is yet more unnecessary complexity in what should be a simple Bash script.

It works fine for now with this particular file, but there’s no guarantee this temerarious and reckless “apply regular expressions to configuration files and hope for the best” approach would work for other files. This underscores the advantages of using infrastructure-as-code or, more simply, good file formats.

87 collapsed lines
#!/bin/bash
# Exit on error
set -eu
# Configuration
PUBLIC_KEY=""
TAILSCALE_KEY=""
USER=$(who -m | awk '{print $1}')
cd "/home/$USER"
# Function to show usage
usage() {
echo "Usage: $0 -k '<public-key-string>' -t '<tailscale-auth-key>'"
exit 1
}
# Parse command line options
while getopts ":k:t:" opt; do
case ${opt} in
k )
PUBLIC_KEY=$OPTARG
;;
t )
TAILSCALE_KEY=$OPTARG
;;
\? )
echo "Invalid Option: -$OPTARG" 1>&2
usage
;;
: )
echo "Option -$OPTARG requires an argument." 1>&2
usage
;;
esac
done
shift $((OPTIND -1))
# Validate required option for Public Key
if [ -z "${PUBLIC_KEY}" ]; then
echo "Missing required argument: -k '<public_key_string>'"
usage
fi
# Validate required option for Tailscale Key
if [ -z "${TAILSCALE_KEY}" ]; then
echo "Missing required argument: -t '<tailscale_auth_key>'"
usage
fi
# Ensure running as root
if [ "$(id -u)" -ne 0 ]; then
echo "This script must be run as root."
exit 1
fi
# Update packages
echo "Updating and upgrading packages..."
apt-get update -y && apt-get upgrade -y
# Install git and curl
apt-get install -y git curl podman
# Add Tailscale package
echo "Installing Tailscale..."
curl -fsSL "https://pkgs.tailscale.com/stable/debian/$(lsb_release -cs).noarmor.gpg" | sudo tee /usr/share/keyrings/tailscale-archive-keyring.gpg >/dev/null
curl -fsSL "https://pkgs.tailscale.com/stable/debian/$(lsb_release -cs).tailscale-keyring.list" | tee /etc/apt/sources.list.d/tailscale.list
# Add Docker package
apt-get install -y ca-certificates
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
tee /etc/apt/sources.list.d/docker.list > /dev/null
apt-get update
apt-get install -y tailscale docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
echo "Authenticating with Tailscale, please wait..."
tailscale up --authkey "$TAILSCALE_KEY"
# Setup SSH
echo "Setting up SSH key..."
mkdir -p "/home/$USER/.ssh"
echo "$PUBLIC_KEY" > "/home/$USER/.ssh/authorized_keys"
chown -R "$USER":"$USER" /home/"$USER"/.ssh
chmod 700 /home/"$USER"/.ssh && chmod 600 /home/"$USER"/.ssh/authorized_keys
# Update SSH configuration to disable password authentication
sed -i '/^#PasswordAuthentication yes/c\PasswordAuthentication no' /etc/ssh/sshd_config
sed -i '/^#PermitRootLogin prohibit-password/c\PermitRootLogin without-password' /etc/ssh/sshd_config
systemctl restart sshd
# Install and configure UFW
echo "Configuring firewall (UFW)..."
apt-get install -y ufw
ufw allow OpenSSH
ufw allow http
ufw allow https
ufw --force enable
20 collapsed lines
# Create consume directories for different people
mkdir -v "./paperless-inbox"
# cd ~/paperless-inbox
for personName in <name list here>; do
mkdir "./paperless-inbox/$personName"
done
# Create paperless-ngx directory
mkdir -v "./paperless-ngx"
# Copy configuration files
cp -a configuration/linux/dpm/. paperless-ngx/
# Start paperless-ngx superuser creation (will prompt for input)
cd paperless-ngx
docker compose run --rm webserver createsuperuser
# Start paperless-ngx
docker compose up -d

Setting up Paperless-ngx

After all of this, as I put it earlier, paperwork had been written it was time to achieve the task I’d set out to do: automate the configuration of a fresh system and deployment of Paperless-ngx to it.

The only interesting part here is that I create a custom consumption directory structure that Paperless-ngx uses to automatically tag files, which forms part of the workflow I setup.

105 collapsed lines
#!/bin/bash
# Exit on error
set -eu
# Configuration
PUBLIC_KEY=""
TAILSCALE_KEY=""
USER=$(who -m | awk '{print $1}')
cd "/home/$USER"
# Function to show usage
usage() {
echo "Usage: $0 -k '<public-key-string>' -t '<tailscale-auth-key>'"
exit 1
}
# Parse command line options
while getopts ":k:t:" opt; do
case ${opt} in
k )
PUBLIC_KEY=$OPTARG
;;
t )
TAILSCALE_KEY=$OPTARG
;;
\? )
echo "Invalid Option: -$OPTARG" 1>&2
usage
;;
: )
echo "Option -$OPTARG requires an argument." 1>&2
usage
;;
esac
done
shift $((OPTIND -1))
# Validate required option for Public Key
if [ -z "${PUBLIC_KEY}" ]; then
echo "Missing required argument: -k '<public_key_string>'"
usage
fi
# Validate required option for Tailscale Key
if [ -z "${TAILSCALE_KEY}" ]; then
echo "Missing required argument: -t '<tailscale_auth_key>'"
usage
fi
# Ensure running as root
if [ "$(id -u)" -ne 0 ]; then
echo "This script must be run as root."
exit 1
fi
# Update packages
echo "Updating and upgrading packages..."
apt-get update -y && apt-get upgrade -y
# Install git and curl
apt-get install -y git curl podman
# Add Tailscale package
echo "Installing Tailscale..."
curl -fsSL "https://pkgs.tailscale.com/stable/debian/$(lsb_release -cs).noarmor.gpg" | sudo tee /usr/share/keyrings/tailscale-archive-keyring.gpg >/dev/null
curl -fsSL "https://pkgs.tailscale.com/stable/debian/$(lsb_release -cs).tailscale-keyring.list" | tee /etc/apt/sources.list.d/tailscale.list
# Add Docker package
apt-get install -y ca-certificates
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
tee /etc/apt/sources.list.d/docker.list > /dev/null
apt-get update
apt-get install -y tailscale docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
echo "Authenticating with Tailscale, please wait..."
tailscale up --authkey "$TAILSCALE_KEY"
# Setup SSH
echo "Setting up SSH key..."
mkdir -p "/home/$USER/.ssh"
echo "$PUBLIC_KEY" > "/home/$USER/.ssh/authorized_keys"
chown -R "$USER":"$USER" /home/"$USER"/.ssh
chmod 700 /home/"$USER"/.ssh && chmod 600 /home/"$USER"/.ssh/authorized_keys
# Update SSH configuration to disable password authentication
sed -i '/^#PasswordAuthentication yes/c\PasswordAuthentication no' /etc/ssh/sshd_config
sed -i '/^#PermitRootLogin prohibit-password/c\PermitRootLogin without-password' /etc/ssh/sshd_config
systemctl restart sshd
# Install and configure UFW
echo "Configuring firewall (UFW)..."
apt-get install -y ufw
ufw allow OpenSSH
ufw allow http
ufw allow https
ufw --force enable
# Create consume directories for different people
mkdir -v "./paperless-inbox"
# cd ~/paperless-inbox
for personName in <name list here>; do
mkdir "./paperless-inbox/$personName"
done
# Create paperless-ngx directory
mkdir -v "./paperless-ngx"
# Copy configuration files
cp -a configuration/linux/dpm/. paperless-ngx/
# Start paperless-ngx superuser creation (will prompt for input)
cd paperless-ngx
docker compose run --rm webserver createsuperuser
# Start paperless-ngx
docker compose up -d

Thoughts

Overall, I’m glad that I have this script. It does what I set out to achieve. It’s a starting point or at least a reference for what an iteration of it needs to do because it absolutely needs improving.

I described some downsides: terse and difficult-to-scan syntax, error-prone argument parsing, very error-prone and destructive string manipulation in configuration files, a lack of idempotency, and an overall sense that this is the wrong tool for the task.

So, what will I convert this to? Ansible. It’s very declarative and mostly idempotent[idempotent]. It also has a package manager called Galaxy, which is sorely missing in Bash. I touched on this earlier.

I have decided that Ansible will become the language behind my homelab infrastructure. It fits a nice middle ground that some other languages or tools don’t quite reach.

For example, I like the Terraform model of “it’s all just pseudo-JSON”, but for some incomprehensible reason, no one has thought to apply Terraform to bare metal/plain operating system configuration.4 Its two main areas are containers and cloud infrastructure, so that’s ruled out.

I also really like the Nix/NixOS paradigm-shifting model of being fully declarative, idempotent, software dependency isolation, and all the rest. However, it isn’t a tool that would allow me to, for example, network boot totally blank systems with PXE to run a Linux installer.

I’m not at all familiar with Chef, Puppet, or Salt. Some of them require a dedicated server to manage state and monitor nodes, and that’s not really the type of complexity I want to deal with.

This puts Ansible in a favourable position. It has the right balance of being mostly idempotent while allowing for side effects and other system-orientated actions. After all, it simply issues commands over SSH. This makes it a pretty attractive option, and this is why Ansible is so prevalent in Linux system administration circles and with people building infrastructure and homelab environments.

So, my next task is to learn Ansible. It will look something like the following code block. I am pretty excited about the ways I will be able to define and build my homelab servers and services.

Hopefully, you can see the significant jump in capability and readability compared to the Bash script I have currently. The Docker APT repository is added, APT packages are installed, and finally the Docker service is started. Recall how I described a third of the original Bash script as paperwork. Well, a large part of that is taken care of by these few lines.

tasks:
- name: Add Docker GPG key
apt_key:
url: https://download.docker.com/linux/debian/gpg
state: present
- name: Add Docker repository
apt_repository:
repo: deb [arch=amd64] https://download.docker.com/linux/debian bullseye stable
state: present
- name: Install Docker Engine
apt:
name:
- docker-ce
- docker-ce-cli
- containerd.io
- name: Start Docker service
service:
name: docker
state: started

Footnotes

  1. Get it?

  2. PowerShell, an object-orientated language that I am fond of using (though it has a couple of annoyances - I can think of at least three different ways to define a function), works at a higher level than strings and has a more modern approach to syntax, with better error-handling semantics too.

  3. Well, technically, Paperless-ngx supports rootless containers if you don’t want to use multiple OCR languages. I do, so this isn’t an option. This feels like an oversight.

  4. I don’t understand why this isn’t an area Terraform isn’t pursuing. I’ve read very poorly explained and vague “explanations” online that ultimately fail to explain anything helpful.

Want to leave a comment? Reply

Stay up to date

Subscribe to my newsletter to stay up to date on my articles and projects