Install Kubeadm and Set Up Production Kubernetes in 2026
Complete guide to installing kubeadm on AlmaLinux and building a production Kubernetes cluster from scratch. Single-node and multi-node setups with Calico, containerd, and RaidFrame integration.
RaidFrame Team
March 16, 2026 · 32 min read
TL;DR — This guide walks you through installing kubeadm on AlmaLinux 9 and bootstrapping a production-grade Kubernetes cluster from bare metal. You'll install containerd, runc, and CNI plugins by hand, initialize a control plane, deploy Calico for pod networking, and optionally join worker nodes. At the end, you'll connect it to RaidFrame so you can manage deployments from a single dashboard.
Table of contents
- Why self-host Kubernetes in 2026?
- What you'll build
- Single-node vs multi-node: which setup do you need?
- A note on versions
- Prerequisites
- Step 1: Update the OS and set hostnames
- Step 2: Disable SELinux
- Step 3: Disable swap
- Step 4: Open firewall ports
- Step 5: Load kernel modules and configure sysctl
- Step 6: Install containerd
- Step 7: Install runc and CNI plugins
- Step 8: Install kubeadm, kubelet, and kubectl
- Step 9: Initialize the control plane
- Step 10: Configure kubectl access
- Step 11: Install Calico for pod networking
- Step 12: Single-node setup — remove the control-plane taint
- Step 13: Join worker nodes (multi-node)
- Step 14: Verify the cluster
- Step 15: Connect a GUI (Lens or k9s)
- Step 16: Connect your cluster to RaidFrame
- Hardening for production
- Common mistakes that will waste your weekend
- Production readiness checklist
- Resetting a kubeadm cluster
- Troubleshooting
- FAQ
Why self-host Kubernetes in 2026?
Managed Kubernetes (EKS, GKE, AKS) is everywhere. So why would anyone install kubeadm by hand?
Cost. A three-node EKS cluster costs ~$220/month before you run a single pod — $73/month just for the control plane, plus EC2 instances. The same cluster on bare metal or Hetzner dedicated servers costs a fraction of that with no per-cluster fee.
Control. Managed services abstract away the control plane. That's fine until you need a specific kubelet flag, a non-standard CNI configuration, or etcd tuning for a write-heavy workload. With kubeadm, every config file is yours.
Compliance. Some industries (defense, healthcare, financial services) require infrastructure to run on-premises or in specific jurisdictions. Managed Kubernetes doesn't always meet those requirements.
Learning. If you want to actually understand Kubernetes — not just use it — building a cluster from scratch is the fastest way. You'll debug problems in minutes that would take hours if you'd only ever used managed services.
This guide is opinionated. We'll tell you what to install, why, and what to skip. If you just want containers running without the infrastructure work, deploy on RaidFrame instead — it takes 60 seconds.
What you'll build
A self-managed Kubernetes cluster running on bare metal or virtual machines. Here's what gets installed and where:
┌─────────────────────────────────────────────────────────────┐
│ Control Plane Node │
│ │
│ ┌──────────┐ ┌────────────┐ ┌───────────┐ ┌────────────┐ │
│ │ kube-api │ │ controller │ │ scheduler │ │ etcd │ │
│ │ server │ │ manager │ │ │ │ │ │
│ └──────────┘ └────────────┘ └───────────┘ └────────────┘ │
│ ┌──────────┐ ┌────────────┐ ┌───────────┐ │
│ │ kubelet │ │ containerd │ │ Calico │ │
│ └──────────┘ └────────────┘ └───────────┘ │
├─────────────────────────────────────────────────────────────┤
│ Worker Node(s) │
│ │
│ ┌──────────┐ ┌────────────┐ ┌───────────┐ ┌────────────┐ │
│ │ kubelet │ │ containerd │ │ Calico │ │ kube-proxy │ │
│ └──────────┘ └────────────┘ └───────────┘ └────────────┘ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Your application pods │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘The stack:
- kubeadm — bootstraps and manages the cluster lifecycle
- containerd — container runtime (replaced Docker since Kubernetes v1.24)
- runc — low-level OCI runtime that containerd calls to create containers
- Calico — pod networking and network policy enforcement (CNI)
- IPVS — high-performance service proxy (replaces iptables-based kube-proxy at scale)
- CoreDNS — cluster-internal DNS (installed automatically by kubeadm)
By the end of this guide your cluster will be running, scheduling pods, and optionally connected to RaidFrame for centralized management.
Single-node vs multi-node: which setup do you need?
Before you start, decide what topology fits your use case.
Single-node (all-in-one)
One machine runs the control plane and your workloads. Good for:
- Local development and testing
- Home labs and learning
- Small internal tools that don't need high availability
- CI/CD runners
Trade-off: No fault tolerance. If the node goes down, everything goes down. The control plane taint must be removed so pods can schedule on the same node.
Multi-node (production)
A typical production setup has at least three machines:
| Role | Count | Purpose |
|---|---|---|
| Control plane | 1-3 | Runs the API server, scheduler, controller manager, etcd |
| Worker | 2+ | Runs your application workloads |
Why three control-plane nodes? etcd needs a quorum. With three nodes, one can fail and the cluster keeps running. With one, any failure is total.
Why at least two workers? So you can drain one for maintenance without downtime.
For this guide, every node follows the same preparation steps (Steps 1-8). Only Step 9 (init) runs on the first control-plane node. Worker nodes join using the token from Step 13.
A note on versions
This guide was written and tested in March 2026. Software versions move fast. Here's what we're using and where to check for updates:
| Component | Version in this guide | Check for latest |
|---|---|---|
| Kubernetes | v1.32 | kubernetes.io/releases |
| containerd | v1.7.24 | GitHub releases |
| runc | v1.2.4 | GitHub releases |
| CNI plugins | v1.6.2 | GitHub releases |
| Calico | v3.29.2 | docs.tigera.io |
Rule of thumb: Use the latest patch version of whatever minor version you choose. Don't mix Kubernetes v1.32 with a containerd version that only supports v1.30. Check the compatibility matrices.
Prerequisites
You need:
- AlmaLinux 9 Stream (or RHEL 9, Rocky Linux 9, CentOS Stream 9 — any RHEL-based distro works)
- 2 CPU cores and 2 GB RAM minimum per node (4+ cores and 4+ GB recommended for production)
- Root access on every node
- Network connectivity between all nodes (they must be able to reach each other on the required ports)
- Unique hostname, MAC address, and product UUID on every node
Check your IP and MAC address:
ip addr showSave your node's IP — you'll need it throughout this guide.
Check the product UUID:
cat /sys/class/dmi/id/product_uuidEvery node in the cluster must have a unique UUID. If you're running VMs from a clone, regenerate the UUID or you'll hit identity conflicts.
Verify the required binaries are available:
for bin in curl tar modprobe iptables; do
command -v $bin &>/dev/null && echo "$bin: OK" || echo "$bin: MISSING"
doneIf anything is missing, install it before proceeding.
Step 1: Update the OS and set hostnames
Run all commands as root unless stated otherwise.
Update the system:
sudo dnf update -ySet a meaningful hostname on each node:
hostnamectl set-hostname cp1.k8s.yourdomain.com # control plane
hostnamectl set-hostname w1.k8s.yourdomain.com # worker 1
hostnamectl set-hostname w2.k8s.yourdomain.com # worker 2Verify:
hostnameConfigure DNS resolution
If you're on a private network without DNS, edit /etc/hosts on every node so they can resolve each other:
192.168.1.10 cp1.k8s.yourdomain.com
192.168.1.11 w1.k8s.yourdomain.com
192.168.1.12 w2.k8s.yourdomain.comIf you're using public DNS, you can skip this — but make sure etc/resolv.conf contains valid nameservers.
Why hostnames matter: Kubernetes uses hostnames as node identifiers. If two nodes share a hostname, the second one will overwrite the first in the cluster.
Step 2: Disable SELinux
Kubernetes requires containers to access the host filesystem for pod networking and volume mounts. SELinux blocks this by default.
Disable it immediately:
setenforce 0Make it permanent:
sudo sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/configVerify:
cat /etc/selinux/config | grep SELINUX=disabledYou should see SELINUX=disabled.
Why not just set SELinux to permissive? Permissive mode logs violations but doesn't block them. It works, but it generates noise in your logs and wastes CPU cycles evaluating policies that never enforce. Disabled is cleaner.
Reboot now to fully apply the SELinux change. The rest of the guide assumes SELinux is off.
rebootStep 3: Disable swap
The kubelet does not work properly with swap enabled. Kubernetes expects to manage memory directly — swap introduces unpredictable latency that breaks scheduling guarantees.
Disable swap immediately:
sudo swapoff -aPrevent swap from re-enabling after reboot by commenting it out in fstab:
sudo sed -i '/\sswap\s/s/^/#/' /etc/fstabVerify swap is off:
cat /proc/swapsYou should see only the header line with no entries:
Filename Type Size Used PriorityAlso verify fstab:
cat /etc/fstabThe swap line should be commented out with #.
What about Kubernetes swap support? As of Kubernetes 1.28+, there's beta support for running with swap via the NodeSwap feature gate. If you want to experiment with it, you can initialize with:
kubeadm init --control-plane-endpoint=<endpoint> --feature-gates=NodeSwap=trueBut this is still not recommended for production. Disable swap unless you have a specific reason to keep it.
Step 4: Open firewall ports
Kubernetes components communicate over specific ports. If you skip this step, kubeadm init will succeed but pods won't schedule and services won't route.
Control-plane node ports
sudo firewall-cmd --permanent --add-port=6443/tcp # API server
sudo firewall-cmd --permanent --add-port=2379-2380/tcp # etcd
sudo firewall-cmd --permanent --add-port=10250/tcp # kubelet API
sudo firewall-cmd --permanent --add-port=10251/tcp # kube-scheduler
sudo firewall-cmd --permanent --add-port=10252/tcp # kube-controller-manager
sudo firewall-cmd --permanent --add-port=10257/tcp # kube-controller-manager (secure)
sudo firewall-cmd --permanent --add-port=10259/tcp # kube-scheduler (secure)
sudo firewall-cmd --permanent --add-port=8080/tcp # API server (insecure, optional)
sudo firewall-cmd --permanent --add-port=30000-32767/tcp # NodePort range
sudo firewall-cmd --permanent --add-port=179/tcp # Calico BGP
sudo firewall-cmd --permanent --add-port=4789/udp # VXLAN (Calico/Flannel)
sudo firewall-cmd --reloadWorker node ports
Workers need fewer ports:
sudo firewall-cmd --permanent --add-port=10250/tcp # kubelet API
sudo firewall-cmd --permanent --add-port=30000-32767/tcp # NodePort range
sudo firewall-cmd --permanent --add-port=179/tcp # Calico BGP
sudo firewall-cmd --permanent --add-port=8080/tcp # Application traffic
sudo firewall-cmd --permanent --add-port=4789/udp # VXLAN
sudo firewall-cmd --reloadCalico-specific ports
If you're using Calico (recommended), also open these:
sudo firewall-cmd --zone=public --add-protocol=4 --permanent # IP-in-IP
sudo firewall-cmd --zone=public --add-port=5473/tcp --permanent # Typha
sudo firewall-cmd --zone=public --add-port=51820/udp --permanent # WireGuard IPv4
sudo firewall-cmd --zone=public --add-port=51821/udp --permanent # WireGuard IPv6
sudo firewall-cmd --zone=public --add-port=443/tcp --permanent # HTTPS
sudo firewall-cmd --reloadVerify everything is open:
firewall-cmd --list-allWhy not just disable the firewall? You could. Many tutorials tell you to run systemctl stop firewalld. Don't do that in production. An exposed etcd port (2379) gives anyone full read/write access to your cluster state. Open only what you need.
Step 5: Load kernel modules and configure sysctl
Kubernetes networking requires specific kernel modules and sysctl parameters. Without them, pod-to-pod communication fails silently.
Load the required modules
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack
EOFLoad them immediately:
sudo modprobe overlay
sudo modprobe br_netfilter
sudo modprobe ip_vs
sudo modprobe ip_vs_rr
sudo modprobe ip_vs_wrr
sudo modprobe ip_vs_sh
sudo modprobe nf_conntrackWhat each module does:
| Module | Purpose |
|---|---|
overlay | Enables OverlayFS, required by containerd for container image layers |
br_netfilter | Allows iptables to see bridged traffic — essential for pod networking |
ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_sh | IPVS (IP Virtual Server) for high-performance service load balancing |
nf_conntrack | Connection tracking for NAT and stateful firewall rules |
Why IPVS? The default kube-proxy mode is iptables, which creates a rule for every service endpoint. At scale (1000+ services), iptables becomes a bottleneck. IPVS uses a hash table — O(1) lookup regardless of service count.
Configure sysctl parameters
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
EOFApply without reboot:
sudo sysctl --systemReload module configuration:
sudo systemctl restart systemd-modules-loadVerify everything is loaded
for module in overlay br_netfilter ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack; do
if lsmod | grep -q "^${module}"; then
echo "${module}: loaded"
else
echo "${module}: NOT loaded"
fi
doneAll modules should show loaded.
Verify sysctl:
sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward net.ipv6.conf.all.forwardingAll values should be 1.
Step 6: Install containerd
containerd is the container runtime that Kubernetes uses to run containers. Docker used to be the default, but Kubernetes removed dockershim in v1.24. containerd is now the standard.
Download and install containerd:
curl -LO https://github.com/containerd/containerd/releases/download/v1.7.24/containerd-1.7.24-linux-amd64.tar.gz
tar -C /usr/local -xzf containerd-1.7.24-linux-amd64.tar.gzSet up the systemd service:
sudo mkdir -p /usr/local/lib/systemd/system
sudo curl -Lo /usr/local/lib/systemd/system/containerd.service \
https://raw.githubusercontent.com/containerd/containerd/main/containerd.service
sudo systemctl daemon-reload
sudo systemctl enable --now containerdVerify it's running:
sudo systemctl status containerdGenerate the containerd config
containerd needs a configuration file that tells it which pause image to use for Kubernetes sandboxes:
sudo mkdir -p /etc/containerd
sudo containerd config default > /etc/containerd/config.tomlUpdate the sandbox (pause) image to match the version kubeadm expects:
sudo sed -i 's|sandbox_image = ".*"|sandbox_image = "registry.k8s.io/pause:3.10"|' /etc/containerd/config.tomlWhy does the pause image matter? Every Kubernetes pod has a hidden "pause" container that holds the network namespace. If the pause image version in containerd doesn't match what kubeadm expects, you'll get a warning during init and potential networking issues.
Enable SystemdCgroup (required for systemd-based distros):
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.tomlRestart containerd to pick up the changes:
sudo systemctl restart containerdStep 7: Install runc and CNI plugins
runc is the low-level container runtime that actually creates and runs containers. containerd calls runc under the hood.
curl -Lo /usr/local/sbin/runc https://github.com/opencontainers/runc/releases/download/v1.2.4/runc.amd64
chmod +x /usr/local/sbin/runcVerify:
runc --versionInstall CNI plugins
CNI (Container Network Interface) plugins provide the basic networking primitives that Calico and other network providers build on.
curl -LO https://github.com/containernetworking/plugins/releases/download/v1.6.2/cni-plugins-linux-amd64-v1.6.2.tgz
sudo mkdir -p /opt/cni/bin
sudo tar -C /opt/cni/bin -xzf cni-plugins-linux-amd64-v1.6.2.tgzInstall nerdctl (optional but recommended)
nerdctl is a Docker-compatible CLI for containerd. Useful for debugging container issues directly on the node:
curl -LO https://github.com/containerd/nerdctl/releases/download/v2.0.3/nerdctl-2.0.3-linux-amd64.tar.gz
sudo tar -C /usr/local/bin -xzf nerdctl-2.0.3-linux-amd64.tar.gzConfigure crictl
crictl is the CLI for CRI-compatible container runtimes. Kubernetes uses it internally, and you'll use it for debugging. Configure it to point at containerd:
cat <<EOF | sudo tee /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
EOFThis prevents the deprecation warning about default endpoints. Verify:
crictl infoStep 8: Install kubeadm, kubelet, and kubectl
These are the three Kubernetes binaries you need:
- kubelet — the agent that runs on every node and manages containers
- kubeadm — the tool to initialize and manage the cluster
- kubectl — the CLI to interact with the Kubernetes API
Add the Kubernetes repository:
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOFInstall:
sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetesEnable kubelet to start on boot:
sudo systemctl enable --now kubelet.serviceThe kubelet will crash-loop until kubeadm initializes the cluster — that's expected. Don't worry about the errors in journalctl -u kubelet at this point.
Repeat Steps 1-8 on every node in your cluster before proceeding.
Try RaidFrame free
Deploy your first app in 60 seconds. No credit card required.
Step 9: Initialize the control plane
This step runs only on the first control-plane node. It bootstraps the entire cluster.
Pull the images first
Download all required container images before initializing. This makes the init faster and lets you catch registry issues early:
kubeadm config images pullRun kubeadm init
kubeadm init --control-plane-endpoint=cp1.k8s.yourdomain.com:6443Replace cp1.k8s.yourdomain.com with your control-plane node's hostname or IP address.
What --control-plane-endpoint does: This sets the stable address that all nodes use to reach the API server. For a single control-plane node, use the node's hostname or IP. For high availability (3 control-plane nodes), use a load balancer's address.
If everything is configured correctly, you'll see preflight checks pass and the init process will take 1-2 minutes:
[init] Using Kubernetes version: v1.32.x
[preflight] Running pre-flight checks
[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
...
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd
[wait-control-plane] Waiting for the kubelet to boot up the control plane
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
...The firewalld warning is fine — we already opened those ports in Step 4.
After a minute or two, you'll get the success output:
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join cp1.k8s.yourdomain.com:6443 --token u6wf5g.9xoa7hg2r0hg9k5j \
--discovery-token-ca-cert-hash sha256:3bf88d426aa64837fd... \
--control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join cp1.k8s.yourdomain.com:6443 --token u6wf5g.9xoa7hg2r0hg9k5j \
--discovery-token-ca-cert-hash sha256:3bf88d426aa64837fd...Save the entire output to a file immediately. You need both join commands — one for control-plane nodes, one for workers. The token expires after 24 hours, but you can generate a new one later with kubeadm token create --print-join-command.
Step 10: Configure kubectl access
Option A: Use root (quick and dirty)
export KUBECONFIG=/etc/kubernetes/admin.confThis works but isn't best practice for production.
Option B: Create a dedicated k8s user (recommended)
Create a non-root user for cluster management:
sudo adduser k8s
sudo passwd k8s
sudo usermod -aG wheel k8sSwitch to the new user:
su - k8sCopy the kubeconfig:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/configTest it:
kubectl get nodesYou'll see your control-plane node in NotReady status:
NAME STATUS ROLES AGE VERSION
cp1.k8s.yourdomain.com NotReady control-plane 2m v1.32.xThe node is NotReady because there's no CNI (pod network) installed yet. That's next.
Why create a separate user? Running kubectl as root is a security risk. The admin.conf file has full cluster-admin privileges. A dedicated user with sudo access gives you audit trails and limits blast radius.
Export kubeconfig for remote access
You can copy the kubeconfig to your local machine and manage the cluster from anywhere:
# On the control-plane node
cat $HOME/.kube/configCopy the output and save it to ~/.kube/config on your local machine (or merge it with an existing kubeconfig). Replace the server: address with the control-plane node's external IP if you're accessing it remotely.
Step 11: Install Calico for pod networking
Without a CNI plugin, pods can't communicate with each other. Calico is the most popular choice — it handles pod networking, network policies, and scales to thousands of nodes.
Download the Calico manifest:
curl https://raw.githubusercontent.com/projectcalico/calico/v3.29.2/manifests/calico.yaml -OApply it:
kubectl apply -f calico.yamlWatch the pods come up:
kubectl get pods -n kube-system -wWait until all Calico pods and CoreDNS pods show Running and 1/1 Ready. This usually takes 1-3 minutes.
Check the node status:
kubectl get nodesNAME STATUS ROLES AGE VERSION
cp1.k8s.yourdomain.com Ready control-plane 5m v1.32.xThe node is now Ready.
Install calicoctl (optional)
calicoctl gives you detailed control over Calico resources. Install it as a kubectl plugin:
sudo curl -L https://github.com/projectcalico/calico/releases/download/v3.29.2/calicoctl-linux-amd64 \
-o /usr/local/bin/kubectl-calico
sudo chmod +x /usr/local/bin/kubectl-calicoVerify:
kubectl calico -hFix inter-pod communication with firewalld
If you're running firewalld (which you should be), Calico interfaces need their own zone to communicate freely:
name=kubeAccept
sudo firewall-cmd --permanent --new-zone=${name}
sudo firewall-cmd --permanent --zone=${name} --set-target=ACCEPT
sudo firewall-cmd --permanent --zone=${name} --add-interface=vxlan.calico
sudo firewall-cmd --permanent --zone=${name} --add-interface="cali+"
sudo firewall-cmd --reloadWhy? firewalld's default zone drops traffic on unknown interfaces. Calico creates vxlan.calico and cali* interfaces dynamically. Without this zone, pods on different nodes can't talk to each other — everything looks healthy but cross-node traffic silently fails.
Step 12: Single-node setup — remove the control-plane taint
Skip this step if you have worker nodes. This is only for single-node clusters.
By default, Kubernetes taints the control-plane node with NoSchedule. This prevents application pods from running on the same node as the API server, scheduler, and etcd.
For a single-node setup, you need to remove this taint:
Check current taints:
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taintsRemove the taint:
kubectl taint nodes cp1.k8s.yourdomain.com node-role.kubernetes.io/control-plane-The trailing - removes the taint.
Important: If you later add worker nodes to this cluster, re-apply the taint to protect your control plane:
kubectl taint nodes cp1.k8s.yourdomain.com node-role.kubernetes.io/control-plane:NoScheduleRunning application workloads on the control plane is fine for development but risky in production. A memory-hungry pod could starve etcd or the API server, taking the entire cluster down.
Step 13: Join worker nodes (multi-node)
Skip this step for single-node setups.
On each worker node, run the join command that kubeadm init printed in Step 9:
kubeadm join cp1.k8s.yourdomain.com:6443 --token <token> \
--discovery-token-ca-cert-hash sha256:<hash>Token expired?
If more than 24 hours have passed, generate a new token on the control-plane node:
kubeadm token create --print-join-commandThis prints a fresh, complete kubeadm join command you can copy and run on the worker.
Joining additional control-plane nodes
For high availability, you can join additional control-plane nodes:
kubeadm join cp1.k8s.yourdomain.com:6443 --token <token> \
--discovery-token-ca-cert-hash sha256:<hash> \
--control-planeThe --control-plane flag tells kubeadm to set up the full control-plane stack (API server, scheduler, controller-manager, etcd) on this node too. You need to copy the certificate authorities first — kubeadm prints instructions for this during init.
Step 14: Verify the cluster
Back on the control-plane node, verify all nodes are ready:
kubectl get nodes -o wideNAME STATUS ROLES AGE VERSION INTERNAL-IP OS-IMAGE
cp1 Ready control-plane 10m v1.32.x 192.168.1.10 AlmaLinux 9
w1 Ready <none> 5m v1.32.x 192.168.1.11 AlmaLinux 9
w2 Ready <none> 5m v1.32.x 192.168.1.12 AlmaLinux 9Check system pods:
kubectl get pods -n kube-systemAll pods should be Running with 1/1 or 2/2 ready. The critical ones:
| Pod | What it does |
|---|---|
etcd-* | Cluster state store |
kube-apiserver-* | API endpoint for all cluster operations |
kube-controller-manager-* | Reconciles desired vs actual state |
kube-scheduler-* | Assigns pods to nodes |
kube-proxy-* | Network routing rules (one per node) |
calico-node-* | Pod networking (one per node) |
coredns-* | Cluster DNS |
Run a test deployment
Deploy nginx to verify end-to-end functionality:
kubectl create deployment nginx --image=nginx --replicas=2
kubectl expose deployment nginx --port=80 --type=NodePortCheck the deployment:
kubectl get pods -o wide
kubectl get svc nginxThe pods should be running across your worker nodes (or on the control plane if single-node). Curl the NodePort to verify:
NODE_PORT=$(kubectl get svc nginx -o jsonpath='{.spec.ports[0].nodePort}')
curl http://localhost:$NODE_PORTYou should see the nginx welcome page. Your cluster is working.
Clean up:
kubectl delete deployment nginx
kubectl delete svc nginxStep 15: Connect a GUI (Lens or k9s)
The command line is great, but a visual dashboard makes it faster to spot problems.
Lens (desktop app)
Lens is a desktop Kubernetes IDE. Install it on your local machine, then:
- Open Lens
- Click Add Cluster
- Paste the kubeconfig you exported in Step 10
- Your cluster appears in the sidebar with real-time pod status, logs, and metrics
k9s (terminal UI)
If you prefer the terminal, k9s gives you a full-screen TUI for cluster management:
# Install on the control-plane node or your local machine
curl -sS https://webinstall.dev/k9s | bash
# Launch
k9sk9s shows pods, services, deployments, and logs in a vim-like interface. Press : to switch views, l to view logs, d to describe a resource.
Both tools use the same kubeconfig — no additional access configuration needed.
Try RaidFrame free
Deploy your first app in 60 seconds. No credit card required.
Step 16: Connect your cluster to RaidFrame
You have a running cluster. You can manage it entirely with kubectl, Lens, and k9s. But if you want to avoid building your own deployment pipeline, monitoring stack, and secrets management from scratch, connect it to RaidFrame.
The RaidFrame agent is a lightweight DaemonSet that connects your self-hosted cluster to the RaidFrame platform. It makes an outbound connection only — no inbound ports needed, so it works behind firewalls and NAT.
Install and register
# Install the RaidFrame CLI if you haven't already
npm install -g @raidframe/cli
# Login to your RaidFrame account
rf login
# Register your cluster — this generates a unique cluster token
rf clusters add --name "production" --context $(kubectl config current-context)Apply the agent manifest with your cluster token:
kubectl apply -f https://agent.raidframe.com/install?token=<your-cluster-token>Verify the agent is running:
kubectl get pods -n raidframe-systemNAME READY STATUS RESTARTS AGE
raidframe-agent-xxxxx 1/1 Running 0 30s
raidframe-agent-xxxxx 1/1 Running 0 30sWhat changes after connecting
| Without RaidFrame | With RaidFrame agent |
|---|---|
| Build your own CI/CD pipeline | Push to main, RaidFrame builds and deploys |
| Install Prometheus + Grafana for metrics | CPU, memory, network metrics out of the box |
kubectl rollout undo for rollbacks | One-click rollback in dashboard |
kubectl create secret for every env var | Secrets management with encryption at rest |
| Manage each cluster separately | Single dashboard for all clusters |
You keep full kubectl access. The agent doesn't replace anything — it adds a management layer on top.
Deploy your first app through RaidFrame
# From your application directory
rf init
rf deploy --cluster productionRaidFrame detects your framework, builds a container image, pushes it to your cluster, and sets up health checks and auto-restart. Same rf deploy workflow whether you're deploying to RaidFrame-managed infrastructure or your own bare metal.
Your self-hosted Kubernetes cluster is now a first-class deployment target — sign up for free if you haven't already.
Hardening for production
A running cluster isn't a production-ready cluster. Here's what you need before real traffic hits it.
Back up etcd
etcd stores your entire cluster state. If etcd dies without a backup, you're rebuilding from scratch.
# Create a snapshot
ETCDCTL_API=3 etcdctl snapshot save /tmp/etcd-backup.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.keyVerify the snapshot:
ETCDCTL_API=3 etcdctl snapshot status /tmp/etcd-backup.db --write-tableAutomate this. Set up a cron job that runs daily and copies snapshots to an off-node location (S3, NFS, another server). A backup that lives on the same disk as etcd is not a backup.
# Example: daily backup via cron
0 2 * * * /usr/local/bin/etcd-backup.sh >> /var/log/etcd-backup.log 2>&1Restore from an etcd backup
If you need to restore:
ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcd-backup.db \
--data-dir=/var/lib/etcd-restored
# Then update the etcd static pod manifest to point at the new data directoryEnable RBAC (it's already on by default)
kubeadm enables RBAC automatically. But the default admin.conf gives cluster-admin privileges to anyone who has it. For production:
- Create per-user or per-team kubeconfigs with limited permissions
- Never share admin.conf — use it only for cluster administration
- Set up ServiceAccounts for CI/CD pipelines with scoped roles
# Example: create a read-only role for a monitoring team
kubectl create clusterrolebinding monitoring-view \
--clusterrole=view \
--user=monitoring-teamEnable Pod Security Standards
Since Kubernetes v1.25, Pod Security Admission replaces PodSecurityPolicy:
# Label a namespace to enforce restricted pod security
kubectl label namespace production \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/warn=restrictedThis prevents pods from running as root, using hostNetwork, or mounting hostPath volumes in that namespace.
Set resource requests and limits
Without resource limits, a single pod can consume all CPU and memory on a node, starving everything else:
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"Set these on every production workload. Consider using LimitRange to enforce defaults per namespace.
Common mistakes that will waste your weekend
We've seen (and made) all of these. Save yourself the pain.
1. Disabling the firewall instead of opening ports
Every second tutorial says systemctl stop firewalld. This works until someone portscans your cluster and gets full etcd access on port 2379. Open specific ports. Keep the firewall on.
2. Forgetting to set SystemdCgroup = true
If you're on a systemd-based distro (AlmaLinux, Ubuntu 22+, RHEL) and containerd's cgroup driver doesn't match kubelet's, you'll get mysterious pod crashes. The symptom: pods start, run for 30 seconds, then get killed. Always set SystemdCgroup = true in containerd's config.toml.
3. Skipping the Calico firewalld zone
Your cluster looks healthy. Pods on the same node can talk to each other. But cross-node traffic is silently dropped. The fix is the kubeAccept zone in Step 11. This one trips up almost everyone running firewalld with Calico.
4. Using the wrong pause image version
The pause image in containerd's config.toml must match what kubeadm expects. Mismatched versions cause a warning during init and can break pod sandbox creation. Check what kubeadm needs with kubeadm config images list and match it.
5. Running everything as root
It works. Until someone accidentally runs kubectl delete namespace kube-system. Create a dedicated k8s user. Use RBAC. Give CI/CD pipelines the minimum permissions they need.
6. Not saving the kubeadm join output
The token expires in 24 hours. If you didn't save it, you need to generate a new one with kubeadm token create --print-join-command. Not the end of the world, but annoying at 2am when you're adding a node.
7. Forgetting etcd backups
etcd is a single point of failure in a single control-plane setup. No backup means a disk failure destroys your entire cluster state — all deployments, services, secrets, config maps, everything. Set up automated backups before you deploy anything that matters.
8. Mixing Kubernetes minor versions across nodes
All nodes in a cluster should run the same Kubernetes minor version. kubelet on workers can be one minor version behind the control plane, but never ahead. Mixing v1.32 and v1.30 will cause subtle API compatibility issues.
Production readiness checklist
Before you route real traffic to this cluster, verify every item:
| Category | Item | Status |
|---|---|---|
| Networking | All nodes can reach each other on required ports | |
| Networking | Calico pods are Running on every node | |
| Networking | Calico firewalld zone is configured | |
| Networking | Cross-node pod communication works | |
| Security | SELinux disabled (or configured for K8s) | |
| Security | Firewall enabled with only required ports open | |
| Security | admin.conf is not shared — per-user kubeconfigs exist | |
| Security | Pod Security Standards enforced on production namespaces | |
| Reliability | etcd backups are automated and tested | |
| Reliability | At least 2 worker nodes (can drain one for maintenance) | |
| Reliability | Resource requests and limits set on all workloads | |
| Monitoring | kubelet and containerd logs are accessible | |
| Monitoring | Node and pod metrics are being collected | |
| Cluster | Swap is disabled and stays disabled after reboot | |
| Cluster | Kernel modules load on boot (modules-load.d) | |
| Cluster | sysctl parameters persist after reboot | |
| Cluster | containerd config has correct pause image and SystemdCgroup |
If you connected to RaidFrame, monitoring and deployment pipeline items are handled automatically.
Resetting a kubeadm cluster
If you need to tear everything down and start over, here's the complete reset procedure. Run this on every node.
# Reset kubeadm state
kubeadm reset --force
# Clear IPVS rules
ipvsadm --clear || true
# Remove CNI configuration
rm -rf /etc/cni/net.d/*
# Remove Kubernetes state
rm -rf /etc/kubernetes /var/lib/etcd /var/lib/kubelet
# Delete Calico network interfaces
ip link | grep cali | awk -F: '{print $2}' | xargs -I{} ip link delete {}
ip link delete tunl0 2>/dev/null || true
ip link delete vxlan.calico 2>/dev/null || true
# Flush iptables
iptables -F
iptables -X
# Stop kubelet
systemctl stop kubelet
# Verify iptables are clean
iptables --listReboot after reset:
rebootAfter reboot, you can run kubeadm init again to create a fresh cluster.
Troubleshooting
Node stuck in NotReady
Check kubelet logs:
journalctl -u kubelet -fMost common causes:
- CNI plugin not installed (install Calico)
- containerd not running (
systemctl status containerd) - Swap still enabled (
cat /proc/swaps)
CoreDNS pods stuck in Pending
Usually means no CNI is installed. After applying Calico, CoreDNS will automatically start.
kubeadm init fails at preflight
Read the preflight errors carefully. Common issues:
# Port already in use — something else is listening on 6443
ss -tlnp | grep 6443
# Container runtime not running
systemctl status containerd
# Swap is on
swapoff -aPods can't communicate across nodes
Check that firewalld zones are configured for Calico interfaces (see Step 11). Also verify that the ip_vs modules are loaded:
lsmod | grep ip_vsToken expired for node join
Generate a new one:
kubeadm token create --print-join-commandFAQ
Can I use Ubuntu instead of AlmaLinux?
Yes. The kernel modules, sysctl, and containerd steps are identical. Replace yum with apt, use ufw instead of firewalld, and adjust the Kubernetes repository URL to the Debian/Ubuntu variant (https://pkgs.k8s.io/core:/stable:/v1.32/deb/). The official kubeadm docs cover both.
Should I use containerd or CRI-O?
Both work. containerd is more widely adopted and has better tooling (nerdctl, crictl). CRI-O is lighter weight and purpose-built for Kubernetes. For most teams, containerd is the safer choice — more documentation, more community support, easier to debug.
Can I run this on Raspberry Pi or ARM?
Yes, but download the arm64 variants of containerd, runc, CNI plugins, and nerdctl. The kubeadm packages from the official repo include ARM builds. Use the same steps with the correct architecture binaries.
How do I upgrade Kubernetes after installation?
kubeadm handles upgrades. On the control-plane node:
sudo yum install -y kubeadm-1.33.x --disableexcludes=kubernetes
sudo kubeadm upgrade plan
sudo kubeadm upgrade apply v1.33.xThen upgrade kubelet and kubectl on every node. Always upgrade one minor version at a time (1.32 to 1.33, not 1.32 to 1.34). Back up etcd before every upgrade.
Is kubeadm production-ready?
Yes. kubeadm is the official Kubernetes bootstrapping tool maintained by the Kubernetes project. Companies run production clusters on kubeadm. The main thing it doesn't handle is high availability for etcd — you need to manage that yourself or use external etcd.
Why not just use a managed Kubernetes service (EKS, GKE, AKS)?
Managed services are great if you want someone else to handle upgrades, etcd, and the control plane. Self-managed kubeadm clusters make sense when you need full control over your infrastructure, want to avoid cloud vendor lock-in, need to run on bare metal, or have compliance requirements that prevent using public cloud. Also: cost. A self-managed cluster on dedicated hardware is 3-5x cheaper than managed Kubernetes at the same scale.
How much does a self-hosted cluster cost?
The software is free. Your cost is the hardware or VMs. Example setups:
| Setup | Spec | Approximate cost |
|---|---|---|
| Home lab (single node) | Used mini PC, 16GB RAM | $150 one-time |
| Dev/staging (3 nodes) | 3x Hetzner dedicated | ~$90/month |
| Production (5 nodes) | 5x bare metal, 32GB each | ~$250-400/month |
Compare that to EKS: $73/month control plane + EC2 instances + data transfer = $400-800/month for the same spec.
What about persistent storage?
kubeadm doesn't include a storage provisioner. For production, you need one:
- Local volumes — fastest, but tied to a specific node. Good for databases on dedicated nodes.
- Longhorn — distributed storage from Rancher. Easy to install, handles replication.
- Rook-Ceph — production-grade distributed storage. More complex to operate but battle-tested.
- NFS — simple and works, but single point of failure without HA setup.
Install one before deploying stateful workloads.
How do I monitor the cluster without Prometheus?
If you connected to RaidFrame, metrics are collected automatically. If not, your options:
- Metrics Server — lightweight, gives
kubectl topsupport. Good starting point. - Prometheus + Grafana — the standard stack. Powerful but takes time to set up.
- k9s — terminal UI with built-in resource monitoring per pod.
At minimum, install Metrics Server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yamlHow does connecting to RaidFrame differ from other management tools?
RaidFrame's agent is lightweight — a single DaemonSet, not a full platform installation. It connects outbound to RaidFrame (no inbound ports needed), so it works behind firewalls and NAT. You keep full kubectl access while gaining a deployment pipeline, monitoring, and secrets management on top. No Helm charts, no CRDs sprawl, no operator framework to learn.
What's next?
You have a production Kubernetes cluster running on your own hardware. That's a real skill — most developers never get past managed services.
Here's where to go from here:
- Set up CI/CD for zero-downtime deploys
- Scale your database on Kubernetes
- Monitor your cluster without the Prometheus learning curve
- Kubernetes vs Serverless: when to use each
Or skip the infrastructure management entirely and deploy to RaidFrame — same Kubernetes under the hood, none of the maintenance. Start for free.
Ship faster with RaidFrame
Auto-scaling compute, managed databases, global CDN, and zero-config CI/CD. Free tier included.