The Lion's Den | Why is enabling automatic updates in NixOS so hard?

Jun 1 2024, 11:30 AM 9 min read

An explanation of NixOS' update process, and why something as simple as enabling automatic updates is so problematic. Image from https://www.pexels.com/photo/shattered-ice-3977222/

Pop quiz: how do you enable automatic updates on your computer?

If you’re using Windows or macOS, chances are automatic updates are already enabled for you. Even certain Linux distros enable them by default. If not, enabling and disabling them is often as simple as opening system settings and unchecking a box, or in the worst case, editing a text file.

And then there’s NixOS. Now, imagine you’re a brand-new NixOS user. You’ve managed to learn the basics: how to configure your system, how to apply that configuration, and how to make changes. Now you want to make sure your system gets regular, automatic updates. You search “NixOS automatic updates” and end up on the NixOS wiki 🔗. Looks easy enough, right? Just copy this block into your configuration, run nixos-rebuild, and go!

system.autoUpgrade = {
  enable = true;
  flake = inputs.self.outPath;
  flags = [
    "--update-input"
    "nixpkgs"
    "-L" # print build logs
  ];
  dates = "02:00";
  randomizedDelaySec = "45min";
};

Wow, what an easy and not-at-all-misleading set of instructions! It even supports Flakes!

Well, not quite. As it turns out, there are a few very important things that this code doesn’t do. If you’re using Flakes, and/or sharing this config with multiple systems, it gets even harder. It’s so frustrating that I felt the need to write this blog. But rather than just complain about problems, I hope to provide a solution that’s (hopefully) fairly easy to implement, but still vastly more complicated than it has any right to be.

All that being said, if I misunderstood something, or you know of an easier way, please let me know 🔗!

First, let’s start with the problems. Why is something as simple as automatic system updates so hard in NixOS?

Problem #1: Git ⚓

NixOS (particularly, Flakes) requires Git 🔗 to manage and track changes to your configuration files. If you didn’t know this, rename the .git folder in your configuration to something like .git.disabled, then try running nixos-rebuild switch --flake .. The rebuild will refuse to run until you’ve initialized a repository and git added the files.

Why is this a problem? One of the files Flakes uses is a file named flake.lock. This isn’t a file that you normally edit; rather, it’s automatically generated from the inputs in your flake.nix file. For instance, my flake.nix uses Nixpkgs version 24.05 🔗 for its packages. You’ll notice that this URL points to a GitHub repository. When I run nix flake update, Nix goes to that repository, finds the latest commit in the nixos-24.05 branch, and stores the commit hash 🔗 in flake.lock. This essentially pins the version of nixos-24.05 and all of the packages it tracks to that specific commit.

The great thing about this is that it enables perfect reproducibility. If I run nixos-rebuild again without touching the lock file, I’ll get the exact same build as when I ran it the first time. I can even copy this folder to a different computer, run nixos-rebuild, and get the exact same build as on the first computer.

HOWEVER👆, remember how I mentioned Nix requires git? Well, git can only track different versions of a file when it’s committed using git add and git commit. That slick block of code that we copied from the NixOS wiki doesn’t do that. So while we may get a fresh flake.lock each time, it’ll get overwritten as soon as the update service kicks off again, effectively killing reproducibility. Strike one.

Problem #2: Multiple systems ⚓

Another great benefit of Flakes is the ability to manage multiple computers using a single flake.nix file. I manage a Git repo 🔗 that I use to configure my gaming PC, travel laptop, home server, and Raspberry Pi. All I have to do is clone the repo to the target system, run nixos-rebuild switch --flake .#<hostname>, and I’m good to go!

Oh but wait, what’s this? I had the system.autoUpgrade block configured for both my home server and travel laptop, but they ran at slightly different times. Within that short time span, a bunch of new commits were made to nixpkgs. Now I have two, different divergent versions of flake.lock. Which one is the right one? Which one should I commit to the repo? How do I tell Git to keep one and overwrite the other? How do I computer??? 😵‍💫

One solution would be to only allow one system to run nix flake update, commit the new lock file, push it to my GitHub repo, then have the other systems pull the latest version using git pull. I think this is the method the Nix developers intended, because there’s a convenient flag we can use to update the lock file and commit it in one go: nix flake update --commit-lock-file.

This could work! Wait, but now we need a central repository to host our configuration file, ideally one that’s always online. Unless you’re cool with running your own code repo, you’re likely gonna use GitHub. You could argue that anyone using Flakes is already using a central repo like GitHub, but still…why should users have to slingshot their systems around an external service just to enable automatic updates? It adds a whole new dimension of complexity for something that should be as easy as checking a box or editing a text file. Strike two.

Problem #3: systems.autoUpgrade is incomplete ⚓

Ok, so we’ve figured out how to update our lock file and commit it an external, always-on repository. Our automatic updates are still limited to just one computer though, and if we enable it for every system, we’ll have a bunch of divergent Git repositories all over the place. What’s missing?

Even if we fix system.autoUpgrade to update and commit the lock file, we still need to git push our changes back up to the central repo. This is really where the NixOS wiki’s system.autoUpgrade solution falls short: it only updates the local config, and there’s no option to enable a push.

Basically, we need to reinvent the autoUpgrade wheel by telling NixOS how to pull the latest version of the repo, update and commit the lock file, then push it back up, so other systems can build from it.

One small saving grace is that Nix makes it fairly easy to create and configure automated services via systemd. And by creating our own Nix options 🔗, we can specify which computer should update the lock file, and which ones should just pull and apply the latest version. And that’s what I ended up doing.

The solution ⚓

The fix consists of two systemd services: one that updates the lock file, and one that pulls and apply updates. I’ll start with just the scripts, and show the complete services at the end of this blog.

Script to update the lock file and push it to the repository ⚓

This script is fairly easy to understand. First, it changes the working directory to your configuration folder. Then, it pulls the latest version of your config to ensure it’s using the latest version, then runs nix flake update. If there are updates, the lock file is committed with a detailed commit message, then the repository is pushed back up. Easy, breezy, beautiful.

cd ${your-nixos-config-folder}
# Make sure we're up-to-date
echo "Pulling the latest version..."
git pull --recurse-submodules
nix flake update --commit-lock-file
git push

Script to pull and apply updates ⚓

This script is a little more convoluted. Like the first one, it changes to your config folder and pulls the latest changes. But we do it slightly differently.

Instead of using git pull, we use git fetch. The difference is that pull grabs the latest changes and merges them, whereas fetch just grabs them. This is so we can determine if there are any differences between our current version and the version we pulled down from our repo.

We use git diff to make this determination. git diff normally prints the differences, so we use the --quiet and --exit-code flags to tell it to only return an exit code. If there are differences, it returns a 1, and if there are no differences, it returns a 0. Unfortunately, exit code 1 typically indicates a failure, so if systemd catches this, it’ll happily stop the script and warn us that the service failed. To prevent this, we can add || true to tell this command to always return true, while preserving the exit code. It’s kind of hacky and gross, but so is Nix, so ¯\_(ツ)_/¯

The rest of this should be pretty straightforward. We check for exit code 1, use git pull to merge the changes, then run nixos-rebuild. Otherwise, we do nothing and the script ends.

cd ${your-nixos-config-folder}
# Check if there are changes from Git.
echo "Pulling latest version..."
git fetch
git diff --quiet --exit-code main origin/main || true
# If we have changes (git diff returns 1), pull changes and run the update
if [ $? -eq 1 ]; then
	echo "Updates found, running nixos-rebuild..."
	git pull --recurse-submodules
	nixos-rebuild switch --flake .
else
	echo "No updates found. Exiting."
fi

Creating systemd services to run automatic NixOS updates ⚓

Now, let’s put everything we’ve learned together into a complete solution.

Essentially, we have two systemd services that are activated by timers. One service runs the script to pull and apply updates, and the other runs the script to update the lock file. Both scripts run daily.

The “apply updates” script runs on all systems by default, but can be explicitly disabled by setting host.services.autoUpgrade = false;. The “update lock file” script must be enabled, and should only be enabled on a single host. You can enable this by settinghost.services.autoUpgrade.pushUpdates = true in the host’s config.

I won’t post the entire thing here, but you can find the full module in my Nix configuration on GitHub 🔗.

Note on the upgrade script running as root ⚓

My nixos-upgrade module has one more little gotcha. In order to update the system, nixos-rebuild (or in my case, nh) has to run as root. However, my config files are stored in my regular user’s home folder, so I can hack on them when I ~~should be focused on work~~ have some free time. Having this folder owned by root is a no-go, so to work around this, I run the scripts as root and use sudo to switch to my regular user when running the git commands. If you want to use this module yourself, you’ll want to change this user to your own. In the future, I might add some more options, so other folks can use this module more easily.

There’s gotta be a better way ⚓

Not gonna lie, this approach feels kinda disgusting. Managing NixOS updates feels more like managing cloud infrastructure than a desktop, which is both a blessing and a curse. Having to set up systemd rules and use external services like GitHub to slingshot files between devices is a burden most regular users won’t be willing to put up with. And granted, you could argue that not using Flakes would make everything easier, but Flakes are such a common part of the NixOS ecosystem now that not using them would be like saying “don’t use apt repositories.” It just cuts you off from too many other options.

That said, if you know of a better way to do this, please let me know! I think this is one of those gotchas that’s holding NixOS back from being a killer distro. I’m hoping if enough of us rub our brain cells together, we can figure out a good solution that preserves reproducibility without sacrificing accessibility.

Previous: "Beeper, The killer chat app"

Next: "Computers as code: Why declarative systems are the future of computing"

Contents