GuidesKuberneteskubeadmDevOps

Install Kubeadm and Set Up Production Kubernetes in 2026

Complete guide to installing kubeadm on AlmaLinux and building a production Kubernetes cluster from scratch. Single-node and multi-node setups with Calico, containerd, and RaidFrame integration.

R

RaidFrame Team

March 16, 2026 · 32 min read

TL;DR — This guide walks you through installing kubeadm on AlmaLinux 9 and bootstrapping a production-grade Kubernetes cluster from bare metal. You'll install containerd, runc, and CNI plugins by hand, initialize a control plane, deploy Calico for pod networking, and optionally join worker nodes. At the end, you'll connect it to RaidFrame so you can manage deployments from a single dashboard.

Table of contents

Why self-host Kubernetes in 2026?

Managed Kubernetes (EKS, GKE, AKS) is everywhere. So why would anyone install kubeadm by hand?

Cost. A three-node EKS cluster costs ~$220/month before you run a single pod — $73/month just for the control plane, plus EC2 instances. The same cluster on bare metal or Hetzner dedicated servers costs a fraction of that with no per-cluster fee.

Control. Managed services abstract away the control plane. That's fine until you need a specific kubelet flag, a non-standard CNI configuration, or etcd tuning for a write-heavy workload. With kubeadm, every config file is yours.

Compliance. Some industries (defense, healthcare, financial services) require infrastructure to run on-premises or in specific jurisdictions. Managed Kubernetes doesn't always meet those requirements.

Learning. If you want to actually understand Kubernetes — not just use it — building a cluster from scratch is the fastest way. You'll debug problems in minutes that would take hours if you'd only ever used managed services.

This guide is opinionated. We'll tell you what to install, why, and what to skip. If you just want containers running without the infrastructure work, deploy on RaidFrame instead — it takes 60 seconds.

What you'll build

A self-managed Kubernetes cluster running on bare metal or virtual machines. Here's what gets installed and where:

┌─────────────────────────────────────────────────────────────┐
│                    Control Plane Node                        │
│                                                             │
│  ┌──────────┐ ┌────────────┐ ┌───────────┐ ┌────────────┐  │
│  │ kube-api │ │ controller │ │ scheduler │ │    etcd    │  │
│  │  server  │ │  manager   │ │           │ │            │  │
│  └──────────┘ └────────────┘ └───────────┘ └────────────┘  │
│  ┌──────────┐ ┌────────────┐ ┌───────────┐                 │
│  │ kubelet  │ │ containerd │ │   Calico  │                 │
│  └──────────┘ └────────────┘ └───────────┘                 │
├─────────────────────────────────────────────────────────────┤
│                     Worker Node(s)                           │
│                                                             │
│  ┌──────────┐ ┌────────────┐ ┌───────────┐ ┌────────────┐  │
│  │ kubelet  │ │ containerd │ │   Calico  │ │ kube-proxy │  │
│  └──────────┘ └────────────┘ └───────────┘ └────────────┘  │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              Your application pods                  │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

The stack:

  • kubeadm — bootstraps and manages the cluster lifecycle
  • containerd — container runtime (replaced Docker since Kubernetes v1.24)
  • runc — low-level OCI runtime that containerd calls to create containers
  • Calico — pod networking and network policy enforcement (CNI)
  • IPVS — high-performance service proxy (replaces iptables-based kube-proxy at scale)
  • CoreDNS — cluster-internal DNS (installed automatically by kubeadm)

By the end of this guide your cluster will be running, scheduling pods, and optionally connected to RaidFrame for centralized management.

Single-node vs multi-node: which setup do you need?

Before you start, decide what topology fits your use case.

Single-node (all-in-one)

One machine runs the control plane and your workloads. Good for:

  • Local development and testing
  • Home labs and learning
  • Small internal tools that don't need high availability
  • CI/CD runners

Trade-off: No fault tolerance. If the node goes down, everything goes down. The control plane taint must be removed so pods can schedule on the same node.

Multi-node (production)

A typical production setup has at least three machines:

RoleCountPurpose
Control plane1-3Runs the API server, scheduler, controller manager, etcd
Worker2+Runs your application workloads

Why three control-plane nodes? etcd needs a quorum. With three nodes, one can fail and the cluster keeps running. With one, any failure is total.

Why at least two workers? So you can drain one for maintenance without downtime.

For this guide, every node follows the same preparation steps (Steps 1-8). Only Step 9 (init) runs on the first control-plane node. Worker nodes join using the token from Step 13.

A note on versions

This guide was written and tested in March 2026. Software versions move fast. Here's what we're using and where to check for updates:

ComponentVersion in this guideCheck for latest
Kubernetesv1.32kubernetes.io/releases
containerdv1.7.24GitHub releases
runcv1.2.4GitHub releases
CNI pluginsv1.6.2GitHub releases
Calicov3.29.2docs.tigera.io

Rule of thumb: Use the latest patch version of whatever minor version you choose. Don't mix Kubernetes v1.32 with a containerd version that only supports v1.30. Check the compatibility matrices.

Prerequisites

You need:

  • AlmaLinux 9 Stream (or RHEL 9, Rocky Linux 9, CentOS Stream 9 — any RHEL-based distro works)
  • 2 CPU cores and 2 GB RAM minimum per node (4+ cores and 4+ GB recommended for production)
  • Root access on every node
  • Network connectivity between all nodes (they must be able to reach each other on the required ports)
  • Unique hostname, MAC address, and product UUID on every node

Check your IP and MAC address:

ip addr show

Save your node's IP — you'll need it throughout this guide.

Check the product UUID:

cat /sys/class/dmi/id/product_uuid

Every node in the cluster must have a unique UUID. If you're running VMs from a clone, regenerate the UUID or you'll hit identity conflicts.

Verify the required binaries are available:

for bin in curl tar modprobe iptables; do
    command -v $bin &>/dev/null && echo "$bin: OK" || echo "$bin: MISSING"
done

If anything is missing, install it before proceeding.

Step 1: Update the OS and set hostnames

Run all commands as root unless stated otherwise.

Update the system:

sudo dnf update -y

Set a meaningful hostname on each node:

hostnamectl set-hostname cp1.k8s.yourdomain.com   # control plane
hostnamectl set-hostname w1.k8s.yourdomain.com     # worker 1
hostnamectl set-hostname w2.k8s.yourdomain.com     # worker 2

Verify:

hostname

Configure DNS resolution

If you're on a private network without DNS, edit /etc/hosts on every node so they can resolve each other:

192.168.1.10  cp1.k8s.yourdomain.com
192.168.1.11  w1.k8s.yourdomain.com
192.168.1.12  w2.k8s.yourdomain.com

If you're using public DNS, you can skip this — but make sure etc/resolv.conf contains valid nameservers.

Why hostnames matter: Kubernetes uses hostnames as node identifiers. If two nodes share a hostname, the second one will overwrite the first in the cluster.

Step 2: Disable SELinux

Kubernetes requires containers to access the host filesystem for pod networking and volume mounts. SELinux blocks this by default.

Disable it immediately:

setenforce 0

Make it permanent:

sudo sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config

Verify:

cat /etc/selinux/config | grep SELINUX=disabled

You should see SELINUX=disabled.

Why not just set SELinux to permissive? Permissive mode logs violations but doesn't block them. It works, but it generates noise in your logs and wastes CPU cycles evaluating policies that never enforce. Disabled is cleaner.

Reboot now to fully apply the SELinux change. The rest of the guide assumes SELinux is off.

reboot

Step 3: Disable swap

The kubelet does not work properly with swap enabled. Kubernetes expects to manage memory directly — swap introduces unpredictable latency that breaks scheduling guarantees.

Disable swap immediately:

sudo swapoff -a

Prevent swap from re-enabling after reboot by commenting it out in fstab:

sudo sed -i '/\sswap\s/s/^/#/' /etc/fstab

Verify swap is off:

cat /proc/swaps

You should see only the header line with no entries:

Filename    Type        Size    Used    Priority

Also verify fstab:

cat /etc/fstab

The swap line should be commented out with #.

What about Kubernetes swap support? As of Kubernetes 1.28+, there's beta support for running with swap via the NodeSwap feature gate. If you want to experiment with it, you can initialize with:

kubeadm init --control-plane-endpoint=<endpoint> --feature-gates=NodeSwap=true

But this is still not recommended for production. Disable swap unless you have a specific reason to keep it.

Step 4: Open firewall ports

Kubernetes components communicate over specific ports. If you skip this step, kubeadm init will succeed but pods won't schedule and services won't route.

Control-plane node ports

sudo firewall-cmd --permanent --add-port=6443/tcp      # API server
sudo firewall-cmd --permanent --add-port=2379-2380/tcp  # etcd
sudo firewall-cmd --permanent --add-port=10250/tcp      # kubelet API
sudo firewall-cmd --permanent --add-port=10251/tcp      # kube-scheduler
sudo firewall-cmd --permanent --add-port=10252/tcp      # kube-controller-manager
sudo firewall-cmd --permanent --add-port=10257/tcp      # kube-controller-manager (secure)
sudo firewall-cmd --permanent --add-port=10259/tcp      # kube-scheduler (secure)
sudo firewall-cmd --permanent --add-port=8080/tcp       # API server (insecure, optional)
sudo firewall-cmd --permanent --add-port=30000-32767/tcp # NodePort range
sudo firewall-cmd --permanent --add-port=179/tcp        # Calico BGP
sudo firewall-cmd --permanent --add-port=4789/udp       # VXLAN (Calico/Flannel)
sudo firewall-cmd --reload

Worker node ports

Workers need fewer ports:

sudo firewall-cmd --permanent --add-port=10250/tcp      # kubelet API
sudo firewall-cmd --permanent --add-port=30000-32767/tcp # NodePort range
sudo firewall-cmd --permanent --add-port=179/tcp        # Calico BGP
sudo firewall-cmd --permanent --add-port=8080/tcp       # Application traffic
sudo firewall-cmd --permanent --add-port=4789/udp       # VXLAN
sudo firewall-cmd --reload

Calico-specific ports

If you're using Calico (recommended), also open these:

sudo firewall-cmd --zone=public --add-protocol=4 --permanent       # IP-in-IP
sudo firewall-cmd --zone=public --add-port=5473/tcp --permanent    # Typha
sudo firewall-cmd --zone=public --add-port=51820/udp --permanent   # WireGuard IPv4
sudo firewall-cmd --zone=public --add-port=51821/udp --permanent   # WireGuard IPv6
sudo firewall-cmd --zone=public --add-port=443/tcp --permanent     # HTTPS
sudo firewall-cmd --reload

Verify everything is open:

firewall-cmd --list-all

Why not just disable the firewall? You could. Many tutorials tell you to run systemctl stop firewalld. Don't do that in production. An exposed etcd port (2379) gives anyone full read/write access to your cluster state. Open only what you need.

Step 5: Load kernel modules and configure sysctl

Kubernetes networking requires specific kernel modules and sysctl parameters. Without them, pod-to-pod communication fails silently.

Load the required modules

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack
EOF

Load them immediately:

sudo modprobe overlay
sudo modprobe br_netfilter
sudo modprobe ip_vs
sudo modprobe ip_vs_rr
sudo modprobe ip_vs_wrr
sudo modprobe ip_vs_sh
sudo modprobe nf_conntrack

What each module does:

ModulePurpose
overlayEnables OverlayFS, required by containerd for container image layers
br_netfilterAllows iptables to see bridged traffic — essential for pod networking
ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_shIPVS (IP Virtual Server) for high-performance service load balancing
nf_conntrackConnection tracking for NAT and stateful firewall rules

Why IPVS? The default kube-proxy mode is iptables, which creates a rule for every service endpoint. At scale (1000+ services), iptables becomes a bottleneck. IPVS uses a hash table — O(1) lookup regardless of service count.

Configure sysctl parameters

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
net.ipv6.conf.all.forwarding        = 1
EOF

Apply without reboot:

sudo sysctl --system

Reload module configuration:

sudo systemctl restart systemd-modules-load

Verify everything is loaded

for module in overlay br_netfilter ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack; do
    if lsmod | grep -q "^${module}"; then
        echo "${module}: loaded"
    else
        echo "${module}: NOT loaded"
    fi
done

All modules should show loaded.

Verify sysctl:

sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward net.ipv6.conf.all.forwarding

All values should be 1.

Step 6: Install containerd

containerd is the container runtime that Kubernetes uses to run containers. Docker used to be the default, but Kubernetes removed dockershim in v1.24. containerd is now the standard.

Download and install containerd:

curl -LO https://github.com/containerd/containerd/releases/download/v1.7.24/containerd-1.7.24-linux-amd64.tar.gz
tar -C /usr/local -xzf containerd-1.7.24-linux-amd64.tar.gz

Set up the systemd service:

sudo mkdir -p /usr/local/lib/systemd/system
sudo curl -Lo /usr/local/lib/systemd/system/containerd.service \
    https://raw.githubusercontent.com/containerd/containerd/main/containerd.service
sudo systemctl daemon-reload
sudo systemctl enable --now containerd

Verify it's running:

sudo systemctl status containerd

Generate the containerd config

containerd needs a configuration file that tells it which pause image to use for Kubernetes sandboxes:

sudo mkdir -p /etc/containerd
sudo containerd config default > /etc/containerd/config.toml

Update the sandbox (pause) image to match the version kubeadm expects:

sudo sed -i 's|sandbox_image = ".*"|sandbox_image = "registry.k8s.io/pause:3.10"|' /etc/containerd/config.toml

Why does the pause image matter? Every Kubernetes pod has a hidden "pause" container that holds the network namespace. If the pause image version in containerd doesn't match what kubeadm expects, you'll get a warning during init and potential networking issues.

Enable SystemdCgroup (required for systemd-based distros):

sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml

Restart containerd to pick up the changes:

sudo systemctl restart containerd

Step 7: Install runc and CNI plugins

runc is the low-level container runtime that actually creates and runs containers. containerd calls runc under the hood.

curl -Lo /usr/local/sbin/runc https://github.com/opencontainers/runc/releases/download/v1.2.4/runc.amd64
chmod +x /usr/local/sbin/runc

Verify:

runc --version

Install CNI plugins

CNI (Container Network Interface) plugins provide the basic networking primitives that Calico and other network providers build on.

curl -LO https://github.com/containernetworking/plugins/releases/download/v1.6.2/cni-plugins-linux-amd64-v1.6.2.tgz
sudo mkdir -p /opt/cni/bin
sudo tar -C /opt/cni/bin -xzf cni-plugins-linux-amd64-v1.6.2.tgz

nerdctl is a Docker-compatible CLI for containerd. Useful for debugging container issues directly on the node:

curl -LO https://github.com/containerd/nerdctl/releases/download/v2.0.3/nerdctl-2.0.3-linux-amd64.tar.gz
sudo tar -C /usr/local/bin -xzf nerdctl-2.0.3-linux-amd64.tar.gz

Configure crictl

crictl is the CLI for CRI-compatible container runtimes. Kubernetes uses it internally, and you'll use it for debugging. Configure it to point at containerd:

cat <<EOF | sudo tee /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
EOF

This prevents the deprecation warning about default endpoints. Verify:

crictl info

Step 8: Install kubeadm, kubelet, and kubectl

These are the three Kubernetes binaries you need:

  • kubelet — the agent that runs on every node and manages containers
  • kubeadm — the tool to initialize and manage the cluster
  • kubectl — the CLI to interact with the Kubernetes API

Add the Kubernetes repository:

cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOF

Install:

sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes

Enable kubelet to start on boot:

sudo systemctl enable --now kubelet.service

The kubelet will crash-loop until kubeadm initializes the cluster — that's expected. Don't worry about the errors in journalctl -u kubelet at this point.

Repeat Steps 1-8 on every node in your cluster before proceeding.

Try RaidFrame free

Deploy your first app in 60 seconds. No credit card required.

Start free

Step 9: Initialize the control plane

This step runs only on the first control-plane node. It bootstraps the entire cluster.

Pull the images first

Download all required container images before initializing. This makes the init faster and lets you catch registry issues early:

kubeadm config images pull

Run kubeadm init

kubeadm init --control-plane-endpoint=cp1.k8s.yourdomain.com:6443

Replace cp1.k8s.yourdomain.com with your control-plane node's hostname or IP address.

What --control-plane-endpoint does: This sets the stable address that all nodes use to reach the API server. For a single control-plane node, use the node's hostname or IP. For high availability (3 control-plane nodes), use a load balancer's address.

If everything is configured correctly, you'll see preflight checks pass and the init process will take 1-2 minutes:

[init] Using Kubernetes version: v1.32.x
[preflight] Running pre-flight checks
        [WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
...
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd
[wait-control-plane] Waiting for the kubelet to boot up the control plane
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
...

The firewalld warning is fine — we already opened those ports in Step 4.

After a minute or two, you'll get the success output:

Your Kubernetes control-plane has initialized successfully!
 
To start using your cluster, you need to run the following as a regular user:
 
  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config
 
Alternatively, if you are the root user, you can run:
 
  export KUBECONFIG=/etc/kubernetes/admin.conf
 
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/
 
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
 
  kubeadm join cp1.k8s.yourdomain.com:6443 --token u6wf5g.9xoa7hg2r0hg9k5j \
        --discovery-token-ca-cert-hash sha256:3bf88d426aa64837fd... \
        --control-plane
 
Then you can join any number of worker nodes by running the following on each as root:
 
kubeadm join cp1.k8s.yourdomain.com:6443 --token u6wf5g.9xoa7hg2r0hg9k5j \
        --discovery-token-ca-cert-hash sha256:3bf88d426aa64837fd...

Save the entire output to a file immediately. You need both join commands — one for control-plane nodes, one for workers. The token expires after 24 hours, but you can generate a new one later with kubeadm token create --print-join-command.

Step 10: Configure kubectl access

Option A: Use root (quick and dirty)

export KUBECONFIG=/etc/kubernetes/admin.conf

This works but isn't best practice for production.

Create a non-root user for cluster management:

sudo adduser k8s
sudo passwd k8s
sudo usermod -aG wheel k8s

Switch to the new user:

su - k8s

Copy the kubeconfig:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Test it:

kubectl get nodes

You'll see your control-plane node in NotReady status:

NAME                     STATUS     ROLES           AGE   VERSION
cp1.k8s.yourdomain.com  NotReady   control-plane   2m    v1.32.x

The node is NotReady because there's no CNI (pod network) installed yet. That's next.

Why create a separate user? Running kubectl as root is a security risk. The admin.conf file has full cluster-admin privileges. A dedicated user with sudo access gives you audit trails and limits blast radius.

Export kubeconfig for remote access

You can copy the kubeconfig to your local machine and manage the cluster from anywhere:

# On the control-plane node
cat $HOME/.kube/config

Copy the output and save it to ~/.kube/config on your local machine (or merge it with an existing kubeconfig). Replace the server: address with the control-plane node's external IP if you're accessing it remotely.

Step 11: Install Calico for pod networking

Without a CNI plugin, pods can't communicate with each other. Calico is the most popular choice — it handles pod networking, network policies, and scales to thousands of nodes.

Download the Calico manifest:

curl https://raw.githubusercontent.com/projectcalico/calico/v3.29.2/manifests/calico.yaml -O

Apply it:

kubectl apply -f calico.yaml

Watch the pods come up:

kubectl get pods -n kube-system -w

Wait until all Calico pods and CoreDNS pods show Running and 1/1 Ready. This usually takes 1-3 minutes.

Check the node status:

kubectl get nodes
NAME                     STATUS   ROLES           AGE   VERSION
cp1.k8s.yourdomain.com  Ready    control-plane   5m    v1.32.x

The node is now Ready.

Install calicoctl (optional)

calicoctl gives you detailed control over Calico resources. Install it as a kubectl plugin:

sudo curl -L https://github.com/projectcalico/calico/releases/download/v3.29.2/calicoctl-linux-amd64 \
    -o /usr/local/bin/kubectl-calico
sudo chmod +x /usr/local/bin/kubectl-calico

Verify:

kubectl calico -h

Fix inter-pod communication with firewalld

If you're running firewalld (which you should be), Calico interfaces need their own zone to communicate freely:

name=kubeAccept
sudo firewall-cmd --permanent --new-zone=${name}
sudo firewall-cmd --permanent --zone=${name} --set-target=ACCEPT
sudo firewall-cmd --permanent --zone=${name} --add-interface=vxlan.calico
sudo firewall-cmd --permanent --zone=${name} --add-interface="cali+"
sudo firewall-cmd --reload

Why? firewalld's default zone drops traffic on unknown interfaces. Calico creates vxlan.calico and cali* interfaces dynamically. Without this zone, pods on different nodes can't talk to each other — everything looks healthy but cross-node traffic silently fails.

Step 12: Single-node setup — remove the control-plane taint

Skip this step if you have worker nodes. This is only for single-node clusters.

By default, Kubernetes taints the control-plane node with NoSchedule. This prevents application pods from running on the same node as the API server, scheduler, and etcd.

For a single-node setup, you need to remove this taint:

Check current taints:

kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

Remove the taint:

kubectl taint nodes cp1.k8s.yourdomain.com node-role.kubernetes.io/control-plane-

The trailing - removes the taint.

Important: If you later add worker nodes to this cluster, re-apply the taint to protect your control plane:

kubectl taint nodes cp1.k8s.yourdomain.com node-role.kubernetes.io/control-plane:NoSchedule

Running application workloads on the control plane is fine for development but risky in production. A memory-hungry pod could starve etcd or the API server, taking the entire cluster down.

Step 13: Join worker nodes (multi-node)

Skip this step for single-node setups.

On each worker node, run the join command that kubeadm init printed in Step 9:

kubeadm join cp1.k8s.yourdomain.com:6443 --token <token> \
    --discovery-token-ca-cert-hash sha256:<hash>

Token expired?

If more than 24 hours have passed, generate a new token on the control-plane node:

kubeadm token create --print-join-command

This prints a fresh, complete kubeadm join command you can copy and run on the worker.

Joining additional control-plane nodes

For high availability, you can join additional control-plane nodes:

kubeadm join cp1.k8s.yourdomain.com:6443 --token <token> \
    --discovery-token-ca-cert-hash sha256:<hash> \
    --control-plane

The --control-plane flag tells kubeadm to set up the full control-plane stack (API server, scheduler, controller-manager, etcd) on this node too. You need to copy the certificate authorities first — kubeadm prints instructions for this during init.

Step 14: Verify the cluster

Back on the control-plane node, verify all nodes are ready:

kubectl get nodes -o wide
NAME    STATUS   ROLES           AGE   VERSION    INTERNAL-IP     OS-IMAGE
cp1     Ready    control-plane   10m   v1.32.x    192.168.1.10    AlmaLinux 9
w1      Ready    <none>          5m    v1.32.x    192.168.1.11    AlmaLinux 9
w2      Ready    <none>          5m    v1.32.x    192.168.1.12    AlmaLinux 9

Check system pods:

kubectl get pods -n kube-system

All pods should be Running with 1/1 or 2/2 ready. The critical ones:

PodWhat it does
etcd-*Cluster state store
kube-apiserver-*API endpoint for all cluster operations
kube-controller-manager-*Reconciles desired vs actual state
kube-scheduler-*Assigns pods to nodes
kube-proxy-*Network routing rules (one per node)
calico-node-*Pod networking (one per node)
coredns-*Cluster DNS

Run a test deployment

Deploy nginx to verify end-to-end functionality:

kubectl create deployment nginx --image=nginx --replicas=2
kubectl expose deployment nginx --port=80 --type=NodePort

Check the deployment:

kubectl get pods -o wide
kubectl get svc nginx

The pods should be running across your worker nodes (or on the control plane if single-node). Curl the NodePort to verify:

NODE_PORT=$(kubectl get svc nginx -o jsonpath='{.spec.ports[0].nodePort}')
curl http://localhost:$NODE_PORT

You should see the nginx welcome page. Your cluster is working.

Clean up:

kubectl delete deployment nginx
kubectl delete svc nginx

Step 15: Connect a GUI (Lens or k9s)

The command line is great, but a visual dashboard makes it faster to spot problems.

Lens (desktop app)

Lens is a desktop Kubernetes IDE. Install it on your local machine, then:

  1. Open Lens
  2. Click Add Cluster
  3. Paste the kubeconfig you exported in Step 10
  4. Your cluster appears in the sidebar with real-time pod status, logs, and metrics

k9s (terminal UI)

If you prefer the terminal, k9s gives you a full-screen TUI for cluster management:

# Install on the control-plane node or your local machine
curl -sS https://webinstall.dev/k9s | bash
 
# Launch
k9s

k9s shows pods, services, deployments, and logs in a vim-like interface. Press : to switch views, l to view logs, d to describe a resource.

Both tools use the same kubeconfig — no additional access configuration needed.

Try RaidFrame free

Deploy your first app in 60 seconds. No credit card required.

Start free

Step 16: Connect your cluster to RaidFrame

You have a running cluster. You can manage it entirely with kubectl, Lens, and k9s. But if you want to avoid building your own deployment pipeline, monitoring stack, and secrets management from scratch, connect it to RaidFrame.

The RaidFrame agent is a lightweight DaemonSet that connects your self-hosted cluster to the RaidFrame platform. It makes an outbound connection only — no inbound ports needed, so it works behind firewalls and NAT.

Install and register

# Install the RaidFrame CLI if you haven't already
npm install -g @raidframe/cli
 
# Login to your RaidFrame account
rf login
 
# Register your cluster — this generates a unique cluster token
rf clusters add --name "production" --context $(kubectl config current-context)

Apply the agent manifest with your cluster token:

kubectl apply -f https://agent.raidframe.com/install?token=<your-cluster-token>

Verify the agent is running:

kubectl get pods -n raidframe-system
NAME                              READY   STATUS    RESTARTS   AGE
raidframe-agent-xxxxx             1/1     Running   0          30s
raidframe-agent-xxxxx             1/1     Running   0          30s

What changes after connecting

Without RaidFrameWith RaidFrame agent
Build your own CI/CD pipelinePush to main, RaidFrame builds and deploys
Install Prometheus + Grafana for metricsCPU, memory, network metrics out of the box
kubectl rollout undo for rollbacksOne-click rollback in dashboard
kubectl create secret for every env varSecrets management with encryption at rest
Manage each cluster separatelySingle dashboard for all clusters

You keep full kubectl access. The agent doesn't replace anything — it adds a management layer on top.

Deploy your first app through RaidFrame

# From your application directory
rf init
rf deploy --cluster production

RaidFrame detects your framework, builds a container image, pushes it to your cluster, and sets up health checks and auto-restart. Same rf deploy workflow whether you're deploying to RaidFrame-managed infrastructure or your own bare metal.

Your self-hosted Kubernetes cluster is now a first-class deployment target — sign up for free if you haven't already.

Hardening for production

A running cluster isn't a production-ready cluster. Here's what you need before real traffic hits it.

Back up etcd

etcd stores your entire cluster state. If etcd dies without a backup, you're rebuilding from scratch.

# Create a snapshot
ETCDCTL_API=3 etcdctl snapshot save /tmp/etcd-backup.db \
    --endpoints=https://127.0.0.1:2379 \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
    --cert=/etc/kubernetes/pki/etcd/server.crt \
    --key=/etc/kubernetes/pki/etcd/server.key

Verify the snapshot:

ETCDCTL_API=3 etcdctl snapshot status /tmp/etcd-backup.db --write-table

Automate this. Set up a cron job that runs daily and copies snapshots to an off-node location (S3, NFS, another server). A backup that lives on the same disk as etcd is not a backup.

# Example: daily backup via cron
0 2 * * * /usr/local/bin/etcd-backup.sh >> /var/log/etcd-backup.log 2>&1

Restore from an etcd backup

If you need to restore:

ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcd-backup.db \
    --data-dir=/var/lib/etcd-restored
 
# Then update the etcd static pod manifest to point at the new data directory

Enable RBAC (it's already on by default)

kubeadm enables RBAC automatically. But the default admin.conf gives cluster-admin privileges to anyone who has it. For production:

  1. Create per-user or per-team kubeconfigs with limited permissions
  2. Never share admin.conf — use it only for cluster administration
  3. Set up ServiceAccounts for CI/CD pipelines with scoped roles
# Example: create a read-only role for a monitoring team
kubectl create clusterrolebinding monitoring-view \
    --clusterrole=view \
    --user=monitoring-team

Enable Pod Security Standards

Since Kubernetes v1.25, Pod Security Admission replaces PodSecurityPolicy:

# Label a namespace to enforce restricted pod security
kubectl label namespace production \
    pod-security.kubernetes.io/enforce=restricted \
    pod-security.kubernetes.io/warn=restricted

This prevents pods from running as root, using hostNetwork, or mounting hostPath volumes in that namespace.

Set resource requests and limits

Without resource limits, a single pod can consume all CPU and memory on a node, starving everything else:

resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

Set these on every production workload. Consider using LimitRange to enforce defaults per namespace.

Common mistakes that will waste your weekend

We've seen (and made) all of these. Save yourself the pain.

1. Disabling the firewall instead of opening ports

Every second tutorial says systemctl stop firewalld. This works until someone portscans your cluster and gets full etcd access on port 2379. Open specific ports. Keep the firewall on.

2. Forgetting to set SystemdCgroup = true

If you're on a systemd-based distro (AlmaLinux, Ubuntu 22+, RHEL) and containerd's cgroup driver doesn't match kubelet's, you'll get mysterious pod crashes. The symptom: pods start, run for 30 seconds, then get killed. Always set SystemdCgroup = true in containerd's config.toml.

3. Skipping the Calico firewalld zone

Your cluster looks healthy. Pods on the same node can talk to each other. But cross-node traffic is silently dropped. The fix is the kubeAccept zone in Step 11. This one trips up almost everyone running firewalld with Calico.

4. Using the wrong pause image version

The pause image in containerd's config.toml must match what kubeadm expects. Mismatched versions cause a warning during init and can break pod sandbox creation. Check what kubeadm needs with kubeadm config images list and match it.

5. Running everything as root

It works. Until someone accidentally runs kubectl delete namespace kube-system. Create a dedicated k8s user. Use RBAC. Give CI/CD pipelines the minimum permissions they need.

6. Not saving the kubeadm join output

The token expires in 24 hours. If you didn't save it, you need to generate a new one with kubeadm token create --print-join-command. Not the end of the world, but annoying at 2am when you're adding a node.

7. Forgetting etcd backups

etcd is a single point of failure in a single control-plane setup. No backup means a disk failure destroys your entire cluster state — all deployments, services, secrets, config maps, everything. Set up automated backups before you deploy anything that matters.

8. Mixing Kubernetes minor versions across nodes

All nodes in a cluster should run the same Kubernetes minor version. kubelet on workers can be one minor version behind the control plane, but never ahead. Mixing v1.32 and v1.30 will cause subtle API compatibility issues.

Production readiness checklist

Before you route real traffic to this cluster, verify every item:

CategoryItemStatus
NetworkingAll nodes can reach each other on required ports
NetworkingCalico pods are Running on every node
NetworkingCalico firewalld zone is configured
NetworkingCross-node pod communication works
SecuritySELinux disabled (or configured for K8s)
SecurityFirewall enabled with only required ports open
Securityadmin.conf is not shared — per-user kubeconfigs exist
SecurityPod Security Standards enforced on production namespaces
Reliabilityetcd backups are automated and tested
ReliabilityAt least 2 worker nodes (can drain one for maintenance)
ReliabilityResource requests and limits set on all workloads
Monitoringkubelet and containerd logs are accessible
MonitoringNode and pod metrics are being collected
ClusterSwap is disabled and stays disabled after reboot
ClusterKernel modules load on boot (modules-load.d)
Clustersysctl parameters persist after reboot
Clustercontainerd config has correct pause image and SystemdCgroup

If you connected to RaidFrame, monitoring and deployment pipeline items are handled automatically.

Resetting a kubeadm cluster

If you need to tear everything down and start over, here's the complete reset procedure. Run this on every node.

# Reset kubeadm state
kubeadm reset --force
 
# Clear IPVS rules
ipvsadm --clear || true
 
# Remove CNI configuration
rm -rf /etc/cni/net.d/*
 
# Remove Kubernetes state
rm -rf /etc/kubernetes /var/lib/etcd /var/lib/kubelet
 
# Delete Calico network interfaces
ip link | grep cali | awk -F: '{print $2}' | xargs -I{} ip link delete {}
ip link delete tunl0 2>/dev/null || true
ip link delete vxlan.calico 2>/dev/null || true
 
# Flush iptables
iptables -F
iptables -X
 
# Stop kubelet
systemctl stop kubelet
 
# Verify iptables are clean
iptables --list

Reboot after reset:

reboot

After reboot, you can run kubeadm init again to create a fresh cluster.

Troubleshooting

Node stuck in NotReady

Check kubelet logs:

journalctl -u kubelet -f

Most common causes:

  • CNI plugin not installed (install Calico)
  • containerd not running (systemctl status containerd)
  • Swap still enabled (cat /proc/swaps)

CoreDNS pods stuck in Pending

Usually means no CNI is installed. After applying Calico, CoreDNS will automatically start.

kubeadm init fails at preflight

Read the preflight errors carefully. Common issues:

# Port already in use — something else is listening on 6443
ss -tlnp | grep 6443
 
# Container runtime not running
systemctl status containerd
 
# Swap is on
swapoff -a

Pods can't communicate across nodes

Check that firewalld zones are configured for Calico interfaces (see Step 11). Also verify that the ip_vs modules are loaded:

lsmod | grep ip_vs

Token expired for node join

Generate a new one:

kubeadm token create --print-join-command

FAQ

Can I use Ubuntu instead of AlmaLinux?

Yes. The kernel modules, sysctl, and containerd steps are identical. Replace yum with apt, use ufw instead of firewalld, and adjust the Kubernetes repository URL to the Debian/Ubuntu variant (https://pkgs.k8s.io/core:/stable:/v1.32/deb/). The official kubeadm docs cover both.

Should I use containerd or CRI-O?

Both work. containerd is more widely adopted and has better tooling (nerdctl, crictl). CRI-O is lighter weight and purpose-built for Kubernetes. For most teams, containerd is the safer choice — more documentation, more community support, easier to debug.

Can I run this on Raspberry Pi or ARM?

Yes, but download the arm64 variants of containerd, runc, CNI plugins, and nerdctl. The kubeadm packages from the official repo include ARM builds. Use the same steps with the correct architecture binaries.

How do I upgrade Kubernetes after installation?

kubeadm handles upgrades. On the control-plane node:

sudo yum install -y kubeadm-1.33.x --disableexcludes=kubernetes
sudo kubeadm upgrade plan
sudo kubeadm upgrade apply v1.33.x

Then upgrade kubelet and kubectl on every node. Always upgrade one minor version at a time (1.32 to 1.33, not 1.32 to 1.34). Back up etcd before every upgrade.

Is kubeadm production-ready?

Yes. kubeadm is the official Kubernetes bootstrapping tool maintained by the Kubernetes project. Companies run production clusters on kubeadm. The main thing it doesn't handle is high availability for etcd — you need to manage that yourself or use external etcd.

Why not just use a managed Kubernetes service (EKS, GKE, AKS)?

Managed services are great if you want someone else to handle upgrades, etcd, and the control plane. Self-managed kubeadm clusters make sense when you need full control over your infrastructure, want to avoid cloud vendor lock-in, need to run on bare metal, or have compliance requirements that prevent using public cloud. Also: cost. A self-managed cluster on dedicated hardware is 3-5x cheaper than managed Kubernetes at the same scale.

How much does a self-hosted cluster cost?

The software is free. Your cost is the hardware or VMs. Example setups:

SetupSpecApproximate cost
Home lab (single node)Used mini PC, 16GB RAM$150 one-time
Dev/staging (3 nodes)3x Hetzner dedicated~$90/month
Production (5 nodes)5x bare metal, 32GB each~$250-400/month

Compare that to EKS: $73/month control plane + EC2 instances + data transfer = $400-800/month for the same spec.

What about persistent storage?

kubeadm doesn't include a storage provisioner. For production, you need one:

  • Local volumes — fastest, but tied to a specific node. Good for databases on dedicated nodes.
  • Longhorn — distributed storage from Rancher. Easy to install, handles replication.
  • Rook-Ceph — production-grade distributed storage. More complex to operate but battle-tested.
  • NFS — simple and works, but single point of failure without HA setup.

Install one before deploying stateful workloads.

How do I monitor the cluster without Prometheus?

If you connected to RaidFrame, metrics are collected automatically. If not, your options:

  • Metrics Server — lightweight, gives kubectl top support. Good starting point.
  • Prometheus + Grafana — the standard stack. Powerful but takes time to set up.
  • k9s — terminal UI with built-in resource monitoring per pod.

At minimum, install Metrics Server:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

How does connecting to RaidFrame differ from other management tools?

RaidFrame's agent is lightweight — a single DaemonSet, not a full platform installation. It connects outbound to RaidFrame (no inbound ports needed), so it works behind firewalls and NAT. You keep full kubectl access while gaining a deployment pipeline, monitoring, and secrets management on top. No Helm charts, no CRDs sprawl, no operator framework to learn.

What's next?

You have a production Kubernetes cluster running on your own hardware. That's a real skill — most developers never get past managed services.

Here's where to go from here:

Or skip the infrastructure management entirely and deploy to RaidFrame — same Kubernetes under the hood, none of the maintenance. Start for free.

KuberneteskubeadmDevOpsself-hostedinfrastructurecontainers

Ship faster with RaidFrame

Auto-scaling compute, managed databases, global CDN, and zero-config CI/CD. Free tier included.