Minikube now supports rootless podman driver for running Kubernetes

225 points by encryptluks2 3 years ago

I getting lost in this orchestrator world. Can somebody explain the use case for Minikube? Or Microk8s? All claim to be certified as "perfectly like kubernetes" and show they can be used in production, but are people using them? Why?

vbezhenar 3 years ago

Kubernetes is considered too complex to deploy for mere mortals.
Here's rough list of things that I had to do, because I'm doing right now exactly that: trying to deploy kubernetes.
1. Prepare server by installing and configuring containerd. Few simple steps.
2. Load one kernel module. Configure ip forward sysctl.
3. Install kubernetes binaries from apt repository.
4. Run kubeadm init
5. Install helm (this part is optional, but makes things simpler): download binary or install it from snap.
6. Install network plugin like flannel or calico.
7. Install nginx ingress plugin.
8. Install storage provider plugin. Also you should have some storage service like NFS server, so Kubernetes can ask for storage.
If you want single-node Kubernetes, I think that should be enough. May be #6 is not even necessary. If you want real cluster, you would need to tinker with load balancer for which I don't have clear picture right now. I'm using external load balancer.
If you're using docker, my understanding is that containerd is already installed.
I actually spent few weeks trying to understand those parts and I have only shallow understanding so far.
With that in mind, simple kubernetes solutions probably have its place among those who can't or don't want to use managed kubernetes from popular clouds.
I have no idea about those simple kuberneteses though.
My opinion is that vanilla kubernetes is not that hard and you should have some understanding about its moving parts anyway. But if you want easy path, I guess it's something worth considering.
- sascha_sl 3 years ago
  
  Flannel and Calico are responsible for assigning pod IPs, so you need them even on a single node.
  One main reason you'd want to run minikube or kind is also that these clusters are easy to reproduce and don't pollute your system's network namespace and sysctl.
- SOLAR_FIELDS 3 years ago
  
  For Load Balancer in your case you would probably provision MetalLB in place of the cloud specific LB solutions that cloud providers deploy. It’s somewhat straightforward, though the steps I believe are specific to each network provider (flannel, calico etc)
  
  pas 3 years ago
  
  Maybe a bit too hacky, but if you only plan to use nginx-ingress + HTTPS (and don't have spare /24 IPs around), then you can set up nginx on each node, run a script that generates a nginx config every few minutes (use the stream module to forward port 80 and 443 TCP/UDP to the ingress nginx)
  Then add the IP addresses of the nodes as a wildcard DNS.
  
  sascha_sl 3 years ago
  
  Or you could just set up the ingress as a daemonset with a NodePort service that has externalTrafficPolicy set to local.
- hda111 3 years ago
  
  Docker’s containerd is different in many ways from the normal version
neurostimulant 3 years ago

If you used docker compose and getting frustrated with the lack of some important features, using these lightweight kubernetes distributions are actually great. Blue/green deployment, a whole bunch of storage volumes supports, and load balancer with automatic letsencrypt supports, and great secret management (ability to mount secrets as files/directories inside a pod is a killer feature) are the reason I use kubernetes instead of docker compose for my side projects even though I ignore the rest of kubernetes features.
pas 3 years ago

> All claim to be certified as "perfectly like kubernetes"
because they are. they are directly built from the go sources, all are wrappers around the meat of k8s. (which are the various control loops packaged into services like api-server, kubelet, schedulker, controller-manager, etc ... and etcd itself)
minikube does a big monolithic build for convenience. (it can do this because all involved components are in pure go.)
microk8s is also a distribution of k8s.
almost all distributions have convenience features to help you with installation/setup. but all they do is what "the hard way" setups do. fetch/copy binaries, generate/sync keys, setup storage (devicemapper, btrfs volumes, whatever), setup wrappers that then start the binaries with the long list of correct arguments, and set them up to start when the node starts (usually by adding systemd services or something).
- stavros 3 years ago
  
  So what are they missing that K8s has? I don't understand where I'd use one vs the other.
  
  remram 3 years ago
  
  In a sense they are distributions. Ubuntu and Fedora can also both do it all, and it's no clear where you'd use one vs the other. You're in the hands of different people.
  
  stavros 3 years ago
  
  Oh hmm, I see, thanks!
  
  semitones 3 years ago
  
  Well also, minikube is meant to facilitate development and testing. We use minikube for local dev, tests, etc. and everything transfers over well to the production k8s cluster
  
  pas 3 years ago
  
  um, they aren't missing anything (but see below). they are k8s, just as you rarely run the Linux kernel without userspace.
  so if you want to get the genuine original mainline experience you go to the project's github repo, they have releases, and mention that the detailed changelog has links to the binaries. yeey. (https://github.com/kubernetes/kubernetes/blob/master/CHANGEL... .. the client is the kubectl binary, the server has the control plane components the node binaries have the worker node stuff), you then have the option to set those up according to the documentation (generate TLS certs, specify the IP address range for pods (containers), install dependencies like etcd, and a CNI compatible container network layer provider -- if you have setup overlay networking eg. VXLAN or geneve or something fancy with openvswitch's OVN -- then the reference CNI plugin is probably sufficient)
  at the end of this process you'll have the REST API (kube-apiserver) up and running and you can start submitting jobs (that will be persisted into etcd, eventually picked up by the scheduler control loop that calculates what should run where and persists it back to etcd, then a control loop on a particular worker will notice that something new is assigned to it, and it'll do the thing, allocate a pod, call CNI to allocate IP, etc.)
  of course if you don't want to do all this by hand you can use a distribution that helps you with setup.
  microk8s is a low-memory low-IO k8s distro by Canonical (Ubuntu folks) and they run dqlite (distributed sqlite) instead of etcd (to lower I/O and memory requirements), many people don't like it because it uses snaps
  k3s is started by Rancher folks (and mostly still developed by them?),
  there's k0s (for bare metal ... I have no idea what that means though), kind (kubernetes in docker), there's also k3d (k3s in docker)
  these distributions work by consuming/wrapping the k8s components as go libraries - https://github.com/kubernetes/kubernetes/blob/master/staging...
  ...
  then there's the whole zoo of various k8s plugins/addons/tools for networking (CNI - https://github.com/containernetworking/cni#3rd-party-plugins), storage (CSI - https://kubernetes-csi.github.io/docs/drivers.html), helm for package management, a ton of security-related things that try to spot errors in all this circus ... and so on.
  
  sascha_sl 3 years ago
  
  Worth mentioning there's a middle path, namely kubeadm. That's the "sanctioned" way to bootstrap clusters without going full from scratch and many other distributions actually use it internally.
  
  stavros 3 years ago
  
  Ahh that clarifies things a lot, thank you!
ihgann 3 years ago

I don't see why you would use Minikube in production (nor have I ever heard anyone do this), but Minikube is exceptionally helpful for local development, when you want to test against a real Kubernetes API server (as well as test any of your desired orchestration for your component).
- BobbyJo 3 years ago
  
  This is a great use case I've found as well. If you have a product that is deployed to K8s, the ability to create clusters on demand for testing, whether local or otherwise, is awesome.
120photo 3 years ago

Local development of k8s apps without having to deploy to a k8s cluster that you may or may not need to setup.
- ghaff 3 years ago
  
  Also edge of the network deployments where you want consistency with datacenter deployments but don't have a lot of local compute resources or are otherwise limited.
dekhn 3 years ago

I am adding another executor to a workflow engine. Minikube is a huge help for my dev work (I always test against a Local Real Instance, which is what minikube is). It's helped on more than one occasion to show that a prod k8s instance lacked a feature or was misconfigured.
Patrick_Devine 3 years ago

I've only used Minikube, kind, and k0s as sandboxes for production kubernetes deployments in the cloud (i.e. EKS). Given I'm already using Docker Desktop on my mac laptop though, the easiest thing to do is just use its built-in kubernetes. It works pretty well, and obviates the need for any of these micro-kubernetes distros.
liotier 3 years ago

Recent comparison of Minikube, K3s and MicroK8s: https://goalz.online/pros-cons-of-minikube-k3s-microk8s-ligh...
- sascha_sl 3 years ago
  
  Minikube is in a different category, alongside kind. These clusters are meant to be disposabale for development, and at least for kind, can't be updated easily.

lunfard000 3 years ago

I wish distros would stop making "docker" an alias for "podman", they are not the same thing and breaks all light-k8s implementations.(looking at you redhat)

candiddevmike 3 years ago

I've encountered cases where podman CLI does not match Docker, specifically for network creation with IPv6--the commands are different. What are you experiencing?
- lunfard000 3 years ago
  
  CLI is not my issue, k3s and kind wont work with podman (or any rootless container for the matter) out of the box, in both you need to do some non-trivial cgroups configuration on the OS to make it work (in k3s this mode is experimental)
rubyist5eva 3 years ago

It's an optional package...just don't install it.

srvmshr 3 years ago

Please excuse my ignorance. I am aware of how Docker generally operates (at a noob level i.e. containerizing an application & uploading to container public registry for general availability).

Given that understanding, could someone please explain what "rootless" would mean? I want to understand these in simpler terms:)

(Thank you in advance)

moody5bundle 3 years ago

Docker is running a daemon with root privileges to start all containers. So if your start a container with "docker run -d ...." you talk to a privileged process. That in turn means, all spawned containers can have root privileges (docker run -v /etc/shadow ... to change the root password of your host). "rootless" actually means running a container process as a normal user. (less attack surface because of less permissions). So if you would run "podman -v /etc/shadow" as a normal user, you wouldn't have the permissions needed to open the file.
As simple as possible: Docker ("normally"): run every command inside container with full root permissions on host $root-> Docker -> container Docker/Podman ("rootless"): run every command as the current user $user-> container
Maybe take a look here for a better explanation: https://docs.docker.com/engine/security/#docker-daemon-attac...
- mikepurvis 3 years ago
  
  The other big piece is capabilities (specifically CAP_SYS_ADMIN) which as I understand it is related but kind of orthogonal to the question of root/rootless.
  For example, buildah (the container-building part of podman) is daemonless and can use the fuse-overlayfs storage driver to build containers rootlessly— you appear as root inside the container, but from the outside, those processes and any files created are owned by the original invoking user or some shim UID/GID based on a mapping table.
  But critically, this doesn't mean it's possible to just run buildah inside any Kubernetes pod and build a container there, because buildah needs to be able to start a user namespace, and must have the /dev/fuse device mapped in. I believe there continues to be ongoing work in this area (for example Linux 5.11 allows overlayfs in unprivileged containers), but the issue tracking [1] it is closed without really being IMO fully resolved, since the linked article [2] from July 2021 is still describing the different scenarios as distinct special cases that each require their own special sets of flags/settings/mounts/whatever.
  [1]: https://github.com/containers/buildah/issues/2554
  [2]: https://www.redhat.com/sysadmin/podman-inside-kubernetes
  
  moody5bundle 3 years ago
  
  Yup, and based on that mapping table the process inside the container is not allowed to create another namespace and/or fuse-overlayfs. That's why you need to mount /dev/fuse into the container (you might also need cap_sys_admin and cap_mknod). There is another link from RedHat which also explains it:
  https://www.redhat.com/sysadmin/podman-inside-container
  You can run "capsh --print" to see your current capabilities. And to run a container without any capabilities:
  podman run --cap-drop ALL -it fedora capsh --print
nonameiguess 3 years ago

Typically, the way a normal Docker installation works is that dockerd (the Docker daemon) is an always-on background service running as root that exposes a socket file with group write privileges owned by the 'docker' group, allowing non-root users to send commands, effectively acting as a privilege-escalation mechanism. There were at least three reasons the daemon needed to run as root, which included needing to modify the host routing table to set up an overlay network, only root being able to create overlay filesystems, and at least some containers themselves having to run as root because they contained files that had to be manipulated in some way by uid 0 in the container.
podman in rootless mode gets around these by using slirp4netns to create pure-userspace overlay networks, fuse-overlayfs to create pure-userspace overlay filesystems (or a driver that can't deduplicate storage on older kernels), and uid/gid mapping in user namespaces to create the illusion inside of a container that an application is running as root when it isn't really root on the host.
Additionally, podman gets rid of the daemon and just uses normal fork/exec of the ephemeral podman process.
The upsides are:
- podman can run entirely in home directories and doesn't need to globally install config files or the container filesystems, making it easier for many users to share the same server.
- Running a malicious or compromised container won't compromise your host (big caveat here is unless it can exploit a vulnerability in user namespaces).
- Users who don't have root at all can still run containers. Note that while this appeared to be true using Docker because you could just be part of the 'docker' group to write to dockerd's socket, effectively this was giving you root.
The biggest downside is the userspace networks and filesystems are slow compared to their in-kernel counterparts, which is why you typically won't see it in any kind of production setting, but minikube is meant to be used as a small-scale mock of production kubernetes run by developers, so it can be a good fit there.
Note that rootless minikube was actually already possible, but way more convoluted than just using rootless podman as the container runtime.
- encryptluks2 3 years ago
  
  I've seen netavark described as a much faster rootless networking stack. Do you know if that is the case? I know that Podman supports it. Does anything like that exist for storage?
coffeekid 3 years ago

Not an expert at all, but here's how I would simplify it. All corrections are welcome!
Docker has two main components. The daemon (you can think of it somewhat like a server) and the client (application you use to run commands).
When you install docker on your machine, it generally installs both. The daemon is a process that runs on your local machine and runs as root.
Rootless refers to the alternative method (used by podman for instance) to run the daemon as a standard user, and delegate root-level tasks to something else, like systemd for instance.
- chrisjc 3 years ago
  
  > Docker has two main components. The daemon (you can think of it somewhat like a server) and the client (application you use to run commands).
  Is the daemon what they call the docker-engine? Is this what's available on Linux natively? Rootless makes sense here bc you wouldn't want one docker image able to interfere with another, or even the Linux system that is running the docker runtime/engine.
  For Windows/Mac docker solutions, where does the daemon live/exist/run? Inside a virtualized Linux instance?
  As I understand it, most of these alternatives to docker-desktop are all just wrappers around a virtualized Linux image running the docker engine/runtime. That's why many of them require a virtualization engine like Virtual Box. So are these no-commercial solutions just wrappers around one or more virtualized Linux runtimes where the docker engine/runtime is running natively?
  If all the above is (approx) correct, then "what" is rootless with this announcement? The docker runtime/engine in the virtualized Linux instance?
  I thought the docker engine/runtime on Linux was always able to run rootless docker images. So what is the news here if all these non-commercial solutions are just wrappers around the docker engine/runtime running in a virtualized Linux?
  
  stonemetal12 3 years ago
  
  Yes for windows and Mac it runs a Linux VM. On windows it can also use WSL2 as the linux vm.
  Docker-engine is the daemon built by docker. Podman is an opensource work a like. Docker-engine doesn't support running as a user other than root. Podman does. This announcement says minikube will work with Podman running as not root.
  
  chrisjc 3 years ago
  
  Thanks for clearing that up.
  I remember hearing that development of docker-engine was ceasing, but could obviously live on as it was forked. I guess rootless is some of the work that Docker (company) wanted to keep proprietary and out of this open-source project.
  Really quite a shame, although understandable from a commercial perspective.
  Assuming that these improvements are finding their way back into an open-source project, I'm glad to hear about this work from minikube and Podman.
  
  AkihiroSuda 3 years ago
  
  > I guess rootless is some of the work that Docker (company) wanted to keep proprietary and out of this open-source project.
  Rootless mode for Docker is completely FLOSS, and its main contributor (me) has even never worked for Docker (company).
  https://github.com/moby/moby/blob/master/contrib/dockerd-roo...
  https://github.com/rootless-containers
  
  AkihiroSuda 3 years ago
  
  > Docker-engine doesn't support running as a user other than root. Podman does.
  Docker engine does.
vocram 3 years ago

It means it uses user namespaces to map a non root user in the top level user namespace (where eg init runs) to a root user inside the container. This allows the container process to run as root inside its user namespace, retaining the full set of capabilities required to call privileged syscalls or access files owned by root.
rkangel 3 years ago

Docker does all its work in a central daemon running as route. Any docker command you run is just sending messages to that central daemon.
You can see some downsides to this when you do the classic developer setup system of having a docker image with your tools and mounting a volume of your source tree into the container for building. When you build, the build products in your filesystem are owned by root because the code was actually running under the daemon. This can cause all sorts of pain.
When you run something like podman, there's no daemon - it's all just processes running as your user (like any other script) so files created end up on your filesystem owned by you.
rythie 3 years ago

It means you don't need to be root to run it.
- coffeekid 3 years ago
  
  You can also call docker commands by being part of the docker group IIRC.
  Doesn't this have more to do with the daemon that the user executing commands ?
  
  q3k 3 years ago
  
  > You can also call docker commands by being part of the docker group IIRC.
  Which effectively gives you root on the host.
  
  prmoustache 3 years ago
  
  Which is an horrible practice and has roughly the same attack surface as login as root all the time.
  
  rythie 3 years ago
  
  With podman there is no daemon, everything is running as you. The standard setup for docker has a daemon running as root, which means when you start a container it has root privileges.
samoppy 3 years ago

candiddevmike 3 years ago

Are there security issues with user namespaces? For instance, Arch disables them in their hardened Linux kernel: https://wiki.archlinux.org/title/Linux_Containers#Unprivileg...

moody5bundle 3 years ago

cgroups v1 had some issues:
https://nvd.nist.gov/vuln/detail/CVE-2022-0492
and:
https://nvd.nist.gov/vuln/detail/CVE-2022-0185

anticristi 3 years ago

Just to make sure: "rootless" is really misleading. As far as I researched, podman either relies on suid binaries or privileged capabilities or both to do its magic. You might as well call it "capabilitiesful podman driver".

mishafb 3 years ago

You do need an suid binary to e.g. set a new user id map, since this requires comparing the user id range owned by you to what you're mapping, but you only do it once and it's a simple, secure operation.
encryptluks2 3 years ago

I don't think it is misleading. Just because you need root privileges to enable "rootless" doesn't mean it isn't rootless once configured.
- RandomBK 3 years ago
  
  It's somewhere in between. You definitely need to enable features that are normally out-of-reach of regular users (i.e. user namespaces, network namespace, unprivileged ping, etc.) However it's still a far cry from full root access, and arguably a smaller surface area than regular run-everything-as-root mode.
wronglyprepaid 3 years ago

> podman either relies on suid binaries
I'm fairly sure this is only the case on older systems, if your system is up to date then podman should not rely on suid binaries.
moody5bundle 3 years ago

Maybe you should include this into your "research":
- https://opensource.com/article/19/2/how-does-rootless-podman...
- https://github.com/containers/podman/blob/main/docs/tutorial...
TL;DR
cgroup V2 support
Installing Podman
Install slirp4netns
Ensure fuse-overlayfs is installed
znpy 3 years ago

you can run containers without root, suid bits or special capabilities with podman.
of course, withouth any of that your containers will be able to do very little (eg: no networking).
- AkihiroSuda 3 years ago
  
  Slirp networking does not need any suid bit or special capability.

gigatexal 3 years ago

A ton of failed tests in that PR and yet it was merged anyway?

aposm 3 years ago

This is awesome, was looking around trying to figure out if this was available yet just last night... and here it is!

baobob 3 years ago

While rootless is a curious technical trick I don't understand why the implementation ever left someone's laptop, both file and networking performance are utterly abysmal, which is completely at odds with one of the primary benefits of containers (near zero overhead).

awoimbee 3 years ago

On servers, yes, rootless doesn't make much sense. But on on my dev laptop, "sudo docker" is tiring and adding docker to the sudoers group is a big security hole (why does everyone seem to think that "docker run" giving root privileges is ok ?!).
- snorremd 3 years ago
  
  This indeed. The Docker team should not include the "adding your user to the docker group"-section in the install documentation. It is very unsafe and even though they link to a document on security implications I don't think all users will truly grasp the implications.
  Better to hide this feature and promote the rootless docker mode for local use. On servers you won't be adding any unprivileged user to the docker group in any case.
- candiddevmike 3 years ago
  
  You should add yourself to the docker group...
  
  rcxdude 3 years ago
  
  which has the same effect, the docker group effectively has root access.
- Bayart 3 years ago
  
  sudo usermod -aG docker $USER
  
  prmoustache 3 years ago
  
  this is not safer.
  
  moody5bundle 3 years ago
  
  this is the same as:
  %wheel ALL=(ALL) NOPASSWD: ALL
  effectively disabling sudo completely.
bogwog 3 years ago

This is the first I've heard about serious performance overhead from going rootless. Do you have any links with more info about it?
I haven't encountered any issues like this personally with rootless podman (although I'm not doing any large scale deployments).
alduin32 3 years ago

What causes the file/networking performance degradation when running unprivileged containers ?
- AkihiroSuda 3 years ago
  
  The filesystem performance degradation was resolved in kernel 5.11 which added support for rootless overlayfs.
  The network performance is caused by slirp (usermode TCP/IP) but it is being resolved too : https://github.com/rootless-containers/bypass4netns
mavhc 3 years ago

overlay2 or fuse-overlayfs?

technerder 3 years ago

Tangential, but are there any easy ways to run server applications on bare metal in a way that removes the need for an underlying OS in order to decreases the overall attack surface an attacker can look for exploits in? (Mainly talking about applications written in Go(TinyGo), Rust, and C++ that can be easily compiled to run on bare metal)

qbasic_forever 3 years ago

Unikernel is what you're interested in, but it's not as easy as taking some Linux-based server software and spitting out a bootable image for baremetal. If you strip the kernel and OS out you lose the network stack and all kinds of system services that most software depends on directly.
I think Google's distroless container images are worth checking out as a quasi-alternative: https://github.com/GoogleContainerTools/distroless You use them as a base for a docker image and copy in your server code. These images are tailor made to strip out _everything_ that's not necessary to run the software--there's no shell for example. So you're still running a Linux kernel, libc, etc. but there's nothing there for an attacker to use other than your app code. You yourself can't even get into a shell to debug or examine what the state of your app is (which can actually be kind of aggravating in development).
- mroche 3 years ago
  
  "Distroless" containers are pretty cool for making deployment images. I feel like a better name could have been chosen, because ultimately you are relying on a distribution and how they operate unless you're building an image from scratch and copying in your self-compiled dependencies.
  I build my own distroless-like images for personal use using Fedora and RHEL, though I do follow the ubi-micro[0] build steps and include a tiny bit of user space components to enable debugging.
  [0] https://catalog.redhat.com/software/containers/ubi9-micro/61...
yencabulator 3 years ago

As an alternative to unikernels, that the other replies are talking about, which require special builds and might not work the same, you can also do something pretty simple:
Just run your program as the only process.
As a Linux host with no other software. No /bin/sh, nothing else in the filesystem.
Simple demo: https://github.com/tv42/alone
rvdca 3 years ago

From what I gather a unikernel is what you are searching for. Many exists - https://github.com/unikraft/unikraft - https://github.com/hermitcore/rusty-hermit are the one that comes to my quick search.
magicalhippo 3 years ago

IncludeOS was one such approach. Sadly the company behind it perished and it seems unmaintained.
https://includeos.org/
bogwog 3 years ago

Another one: https://mirage.io/

jpswade 3 years ago

Does this mean it’ll be faster?

encryptluks2 3 years ago

No I don't think so and in fact rootless containers can be slower due to user-level networking and overlay storage, but the goal is more isolation and security.

cyberpunk 3 years ago

Please please please let rke follow suit :)

Linda703 3 years ago

[dead]