Show HN: Nucleus – A security-hardened, Nix-native container runtime

github.com

40 points by 0kenx 1 week ago

Hi HN, I've been building Nucleus, a lightweight Linux container runtime focused on two workloads: ephemeral AI-agent sandboxes and declarative NixOS services. It's a single Rust binary, no daemon.

It is not a Docker replacement and not a strict subset of Docker either. I dropped the entire image-and-distribution half (no Dockerfile, no layers, no registry, no pull/push, no persistent storage layer) in exchange for going deeper on isolation and reproducibility. The rootfs is either a directory copied into tmpfs (agent mode) or a Nix-built closure mounted read-only (production mode). If your mental model is "run my image instead of docker run," this won't fit. If it's "run untrusted or ephemeral workloads with stronger, auditable isolation on a single host," that's the target.

Things that I think are interesting:

  - Defense-in-depth defaults. All capabilities dropped, ~100-syscall seccomp allowlist (vs Docker's ~300), up to 8 namespaces including time/cgroup, Landlock LSM path ACLs per service.
  - Deny-by-default egress. Outbound traffic is denied unless you allow specific CIDRs or DNS-resolved domains. Enforced with namespace-local iptables rules.
  - Externalized, hash-pinned security policies. seccomp (JSON), capabilities (TOML), and Landlock (TOML) live as separate SHA-256-verified files, decoupled from the rootfs build. There's a nucleus seccomp generate that records syscalls in trace mode and emits a minimal profile.
  - gVisor as a first-class integrated runtime, not an add-on. Explicit network modes including a gvisor-host mode that's intentionally separate from native host networking.
  - Nix-native production path. nucleus.lib.mkRootfs builds locked-down closures; rootfs attestation verifies a per-file SHA-256 manifest at startup; first-class NixOS module.
  - Formal verification. TLA+ specs for the isolation/resource/filesystem/security/gVisor subsystems, checked with Apalache, plus property-based tests that drive the Rust implementation against the specs.

Honest tradeoffs: - Linux x86_64 only. No macOS/Windows/BSD, no plans. - No CNI, no overlay networks, no cluster orchestration. nucleus compose is a single-host TOML DAG over systemd, not Swarm/K8s. - Ephemeral-by-default storage. Persistence is opt-in via explicit --volume binds. - Agent mode applies several mechanisms best-effort by design (warn-and-continue on seccomp/Landlock failure). For fail-closed isolation on ephemeral workloads use --service-mode strict-agent; for long-running services use production mode.

Cold-start is ~12ms in the native runtime. Postgres 18 pgbench numbers under Nucleus are within noise of bare metal in our harness (full results in benches/).

waterfisher 1 week ago

Please, guys, I beg of you: even if you're going to let LLMs generate whole wheel-reinventing GitHub repositories for you (I've let them generate many!), at least write your Hacker News posts yourself. The ability to write a Hacker News post without LLM assistance non-trivially relates to the ability to develop good software, because it boils down to skills conceptualising the project in a way that makes sense to humans, such that the project is product-shaped, rather than loose-blob-of-proper-nouns shaped. It's just very difficult to invest trust in a piece of software doing the right thing when it's not clear someone on the other end has enough ability to express their own ends in writing to make clear what that right thing is.

  • mpalmer 1 week ago
        If your mental model is "run my image instead of docker run," this won't fit. If it's "run untrusted or ephemeral workloads with stronger, auditable isolation on a single host," that's the target.
    

    This in particular is barely coherent.

wallzero 1 week ago

This is neat! Is it rootless? Could it pair with devenv?

I've just gone down a rabbit hole with Fedora atomic desktop (Kinoite), Flatpak Zed, devcontainers with podman compose using the Debian container and nix feature, and devenv.

It allows me to keep an immutable OS while still having an infrastructure as code development experience. Also team members on MacOS or Windows can choose to use devcontainers to wrap devenv or just skip devcontainers and the extra isolation. It's pretty portable.

  • lifeisstillgood 1 week ago

    >>> devcontainers with podman compose using the Debian container and nix feature, and devenv.

    Can you expand on that please?

    • wallzero 1 week ago

      Sure!

      Side note: Unfortunately VSCode devcontainers aren't open source and do not work with VSCodium. Upvote if you'd like VSCode devcontainers open sourced. [1] This example should still work with VSCode though. And the devcontainer CLI.

      Also, Zed has some issues around Podman and SELinux with an open PR. [2] And unfortunately Podman Compose does not currently work with Flatpak Zed. [3]

      In Zed to enable Podman, add the following to Zed 'settings.json':

        "use_podman": true
      

      Then we're just mostly following the guide:

      https://containers.dev/guide/dockerfile

      Create '.devcontainer/devcontainer.json':

        {
          "name": "projectName",
          "runArgs": ["--name", "projectName"],
          "dockerComposeFile": "docker-compose.yml",
          "service": "devcontainer",
          "features": {
            "ghcr.io/devcontainers/features/nix:1": {
              "packages": "devenv"
            }
          },
          "workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}",
          "onCreateCommand": "nix-env -iA nixpkgs.devenv",
          "postCreateCommand": "git config --global user.name \"${GIT_USER_NAME}\" && git config --global user.email \"${GIT_USER_EMAIL}\" && git config --global --add --bool push.autoSetupRemote true && echo 'eval \"$(devenv hook bash)\"' | tee -a ~/.bashrc"
        
          // If compose isn't needed use the following:
          // "image": "mcr.microsoft.com/devcontainers/base:debian",
          // "containerEnv": {
          //   "GIT_USER_NAME": "${localEnv:GIT_USER_NAME}",
          //   "GIT_USER_EMAIL": "${localEnv:GIT_USER_EMAIL}",
          //   "SSH_AUTH_SOCK": "/run/host-services/ssh-auth.sock",
          // },
          // "mounts": [
          //   "source=${localEnv:XDG_RUNTIME_DIR}/ssh-agent.socket,target=/run/host-services/ssh-auth.sock,type=bind",
          // ],
        }
      

      Then create '.devcontainer/docker-compose.yml':

        name: projectName
        services:
          devcontainer:
            image: mcr.microsoft.com/devcontainers/base:debian
            command: sleep infinity
            userns_mode: keep-id
            environment:
              SSH_AUTH_SOCK: /run/host-services/ssh-auth.sock
              GIT_USER_EMAIL: ${GIT_USER_EMAIL?err}
              GIT_USER_NAME: ${GIT_USER_NAME?err}
              POSTGRES_DB: ${POSTGRES_DB:-projectName}
              POSTGRES_USER: ${POSTGRES_USER:-postgres}
              POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-postgres}
            ports:
              # To connect to postgres running inside the container
              - target: 5432
                published: 5432
                protocol: tcp
                host_ip: 127.0.0.1
                mode: host
            volumes:
              - ${XDG_RUNTIME_DIR}/ssh-agent.socket:/run/host-services/ssh-auth.sock:bind
              - ..:/workspaces/projectName:cached
      

      And lastly create 'devenv.nix':

        { pkgs, config, ... }: {
          env.GREET = "determinism";
        
          enterShell = ''
            echo hello ${config.env.GREET}
          '';
        
          packages = [
            pkgs.nodejs
            pkgs.yarn
          ];
        
          services = {
            postgres = {
              enable = true;
              listen_addresses = "0.0.0.0";
              hbaConf = ''
                # TYPE      DATABASE      USER      ADDRESS       METHOD
                  local       all         all                     peer
                  host        all         all       127.0.0.1/32  trust
                  host        all         all       0.0.0.0/0     md5
              '';
              initialDatabases = [
                {
                  name = "postgres";
                }
                {
                  name = "projectName";
                }
                {
                  name = "projectName_auth";
                }
              ];
              initialScript = ''
                CREATE ROLE postgres SUPERUSER LOGIN PASSWORD 'postgres';
                CREATE ROLE api LOGIN PASSWORD 'api';
                CREATE ROLE auth LOGIN PASSWORD 'auth';
              '';
              settings = {
                wal_level = "logical";
              };
            };
          };
        
          scripts = {
            drizzle.exec = "npx lerna run --scope @projectName/drizzle \"$@\"";
            better-auth.exec = "npx lerna run --scope @projectName/better-auth \"$@\"";
          };
        }
      

      On Linux with SELinux, until the PR [2] is merged, a workaround for Zed needs to be applied:

        # ~/.config/containers/containers.conf
        [containers]
        label = false
      

      After this you can work within a podman container, connect to adjacent compose services, and use nix and devenv. If a collaborator wants to skip containers they can just run devenv locally. Though I think devcontainers running devenv is actually the easier route provided that they are setup and working on your OS.

      And this all works pretty much out of the box without root on an immutable OS like Fedora Silverblue/Kinoite.

      ---

      [1](https://github.com/microsoft/vscode-remote-release/issues/11...)

      [2](https://github.com/zed-industries/zed/pull/58500)

      [3](https://github.com/flathub/dev.zed.Zed/pull/342#issuecomment...)

  • 0kenx 1 week ago

    Yes it's rootless and can pair with devenv. MacOS is unfortunately not supported because seccomp is not available.

lavaman131 1 week ago

Very cool to see more security focused tools being built here for the Nix ecosystem. What were some of the biggest roadblocks or challenges you hit when building this?

yjftsjthsd-h 1 week ago

> rootfs attestation verifies a per-file SHA-256 manifest at startup;

What threat model does this protect against? Certainly nice, especially for free, but wondering about utility.

  • 0kenx 1 week ago

    it's a simple integrity check for catching deployment drift/tampering.

alberand 1 week ago

Isn't it the same as using systemd-nspawn? containers.<name> let you declare containers with nspawn. What's the difference?

  • 0kenx 1 week ago

    my main reason for building this is gvisor/seccomp/capability/landlock

jambay 1 week ago

I'm curious if Linux aarch64 would be difficult to support with this.

Bnjoroge 1 week ago

pretty cool but docker support is a no-brainer. not having it is a deal-breaker