Phase 4: k3s Cluster Setup¶
Time estimate: ~20 minutes
Prerequisites: All 4 VMs are online on Tailscale (verified in Phase 3)
What is k3s?¶
k3s is a lightweight version of Kubernetes (the container orchestration system). Kubernetes manages running multiple containers (applications) across multiple machines, restarting them if they crash, balancing load, and much more.
k3s runs on 3 nodes: - k3s-server: the "brain" - makes all decisions, stores cluster state - k3s-agent-1 and k3s-agent-2: the "workers" - actually run the containers
Not familiar with Kubernetes or k3s? See the Kubernetes/k3s technology guide.
Why Flannel Over Tailscale?¶
k3s uses a networking layer called Flannel to route traffic between containers on different nodes. Flannel normally uses the node's main LAN IP address.
The problem: Home LAN IPs can change if the router hands out new DHCP leases. When a node's IP changes, Flannel's internal networking tables become outdated and pods stop being able to communicate.
The solution: Configure Flannel to use the Tailscale interface (tailscale0)
instead of the LAN interface. Tailscale IPs (100.x.x.x) never change as long as
the device is enrolled in the tailnet.
For detailed explanation, see Flannel over Tailscale.
4.1 Option A - Via GitHub Actions (Recommended)¶
Step 1: Deploy the k3s Server¶
- Go to Actions → Ansible - Deploy k3s → Run workflow
- Set
target_host = k3s-server - Click Run workflow
What this does:
- Connects to k3s-server.tailnet.ts.net via Tailscale SSH
- Retrieves the node's Tailscale IP (tailscale ip -4)
- Writes /etc/rancher/k3s/config.yaml:
write-kubeconfig-mode: "644"
tls-san:
- "k3s-server"
- "<tailscale-ip>"
node-ip: "<tailscale-ip>"
flannel-iface: tailscale0
Ready status
What is config.yaml?
This file tells k3s how to configure itself when it starts:
- tls-san: Extra DNS names that are valid for the Kubernetes API certificate
(allows connecting by hostname, not just IP)
- node-ip: Forces k3s to use the Tailscale IP instead of the LAN IP
- flannel-iface: Forces Flannel to use the tailscale0 network interface
Step 2: Deploy k3s Workers¶
Run Actions → Ansible - Add k3s Worker Node (Tailscale) twice:
| Run | Input: worker_host |
Input: server_host |
|---|---|---|
| 1st run | k3s-agent-1 |
k3s-server |
| 2nd run | k3s-agent-2 |
k3s-server |
What this does for each worker:
- Fetches the node token from /var/lib/rancher/k3s/server/node-token on the server
- Sets the server URL: K3S_URL=https://k3s-server.tailnet.ts.net:6443
- Connects to the worker node via Tailscale SSH
- Retrieves the worker's Tailscale IP
- Writes /etc/rancher/k3s/config.yaml with Tailscale IP configuration
- Installs the k3s agent and joins it to the cluster
4.2 Option B - Manual Ansible¶
Use this if GitHub Actions is unavailable.
# Install Ansible
pip install ansible
# Ensure you are on the tailnet
tailscale up
# Add the k3s-server host key to known_hosts via ssh-keyscan
ssh-keyscan k3s-server.tailnet.ts.net >> ~/.ssh/known_hosts 2>/dev/null
# Create Ansible inventory file for the server
cat > /tmp/inventory-server.yml <<EOF
---
all:
children:
k3s:
hosts:
k3s-server:
ansible_host: k3s-server.tailnet.ts.net
ansible_user: ubuntu
ansible_ssh_common_args: '-o StrictHostKeyChecking=yes'
vars:
ansible_python_interpreter: /usr/bin/python3
EOF
# Navigate to the Ansible directory
cd /path/to/homelab/ansible
# Deploy the k3s server
ansible-playbook playbooks/deploy_k3s.yml -i /tmp/inventory-server.yml
# Get the cluster join token from the server
TOKEN=$(ssh ubuntu@k3s-server.tailnet.ts.net \
"sudo cat /var/lib/rancher/k3s/server/node-token")
K3S_URL="https://k3s-server.tailnet.ts.net:6443"
echo "Token: $TOKEN"
# Deploy agent-1
cat > /tmp/inventory-agent1.yml <<EOF
---
all:
children:
k3s_workers:
hosts:
k3s-agent-1:
ansible_host: k3s-agent-1.tailnet.ts.net
ansible_user: ubuntu
vars:
ansible_python_interpreter: /usr/bin/python3
EOF
K3S_TOKEN="$TOKEN" K3S_URL="$K3S_URL" \
ansible-playbook playbooks/deploy_k3s_worker_tailscale.yml \
-i /tmp/inventory-agent1.yml
# Deploy agent-2
cat > /tmp/inventory-agent2.yml <<EOF
---
all:
children:
k3s_workers:
hosts:
k3s-agent-2:
ansible_host: k3s-agent-2.tailnet.ts.net
ansible_user: ubuntu
vars:
ansible_python_interpreter: /usr/bin/python3
EOF
K3S_TOKEN="$TOKEN" K3S_URL="$K3S_URL" \
ansible-playbook playbooks/deploy_k3s_worker_tailscale.yml \
-i /tmp/inventory-agent2.yml
4.3 Apply the Tailscale Startup Ordering Fix¶
⚠️ Required after every fresh k3s deployment. Without this, k3s nodes may fail to initialize Flannel correctly after a reboot, causing pods on different nodes to lose network connectivity.
The problem: On reboot, k3s starts before Tailscale finishes connecting. When
k3s starts and Tailscale isn't ready yet, Flannel can't bind to tailscale0 and
creates a broken network setup.
The fix: A systemd "drop-in" file that makes the k3s service wait for Tailscale to be fully connected before starting.
# Via GitHub Actions (recommended):
# Actions → Ansible - Fix k3s Tailscale Startup Order → Run workflow
# Leave target_hosts as default: k3s-server,k3s-agent-1,k3s-agent-2
What this creates on each node:
# /etc/systemd/system/k3s.service.d/after-tailscale.conf
# (k3s-agent.service.d/after-tailscale.conf on worker nodes)
[Unit]
After=tailscaled.service
Wants=tailscaled.service
[Service]
ExecStartPre=/bin/sh -c 'until ip addr show tailscale0 2>/dev/null | grep -q "inet 100\."; do sleep 2; done'
This causes k3s to:
1. Start only after the tailscaled service has started
2. Poll every 2 seconds until tailscale0 shows a valid 100.x.x.x IP address before proceeding
Reference: Flannel over Tailscale for full explanation of the race condition and manual recovery commands.
4.4 Install Longhorn Prerequisites¶
Longhorn is the distributed storage system used by Kubernetes workloads (databases, persistent data). It requires some packages to be installed on all nodes before it can work.
# Via GitHub Actions (recommended):
# Actions → Ansible - Install Longhorn Prerequisites → Run workflow
What this installs on each node:
- open-iscsi - block storage protocol used by Longhorn
- nfs-common - NFS client for potential NFS mounts
- util-linux - system utilities
- Loads the iscsi_tcp kernel module
Not familiar with Longhorn? See the Longhorn technology guide.
4.5 Verify Cluster Health¶
SSH to the k3s server and confirm all nodes are Ready:
Expected output:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP ...
k3s-server Ready control-plane,master 5m v1.x.x <k3s-server-ts-ip> <none> ...
k3s-agent-1 Ready <none> 3m v1.x.x <k3s-agent-1-ts-ip> <none> ...
k3s-agent-2 Ready <none> 3m v1.x.x <k3s-agent-2-ts-ip> <none> ...
Key: All
INTERNAL-IPvalues should be Tailscale addresses (100.x.x.x). If they show LAN IPs (), the flannel-iface config was not applied. Re-run the k3s deployment Ansible playbook.
Verify all system pods are running:
Look for all pods in kube-system namespace to be Running or Completed.
4.6 Configure Local kubectl Access (Optional)¶
If you want to run kubectl commands from your laptop instead of SSHing to the server:
# Copy kubeconfig from the server
scp ubuntu@k3s-server.tailnet.ts.net:/etc/rancher/k3s/k3s.yaml ~/.kube/k3s-config
# Update the server address to the Tailscale hostname
sed -i 's|https://127.0.0.1:6443|https://k3s-server.tailnet.ts.net:6443|' \
~/.kube/k3s-config
# Use this kubeconfig
export KUBECONFIG=~/.kube/k3s-config
# Verify it works
kubectl get nodes
What is kubeconfig?
A kubeconfig file tellskubectlwhere to find the Kubernetes API server and how to authenticate. By default, k3s stores it at/etc/rancher/k3s/k3s.yamlon the server. The file contains a client certificate and private key.
Summary Checklist¶
Before proceeding to Phase 5:
- k3s server deployed - visible in
kubectl get nodesasReady - k3s-agent-1 joined and shows as
Readyinkubectl get nodes - k3s-agent-2 joined and shows as
Readyinkubectl get nodes - All nodes show Tailscale IPs (
100.x.x.x) as their INTERNAL-IP - Tailscale startup fix applied to all 3 nodes
- Longhorn prerequisites installed on all nodes
- All
kube-systempods areRunning