Kubernetes Planning
Planned Kubernetes and OpenShift deployment on Proxmox VMs across both sites: cluster topology, node placement, machine-network allocations, pod/service CIDR design, and the component stack.
Kubernetes and OpenShift clusters are planned for both sites as Proxmox VMs, backed by Site B Ceph for block storage; no cluster has been deployed as of 2026-06-28. Planning covers cluster topology, control-plane and worker placement, VLAN and machine-network allocations, pod/service CIDR design, and the component stack.
Architecture Constraints
These rules are fixed. They must not be violated when planning or deploying clusters.
No cross-site Kubernetes clusters
Keep clusters site-local. Do not stretch a Kubernetes or OpenShift cluster across the WireGuard VPN. Cross-site continuity uses application-level replication, GitOps sync (ArgoCD), and disaster-recovery procedures — not cluster federation.
No stretched Ceph for Kubernetes storage
Site B Ceph is local-only. Do not create PersistentVolumes that require cross-site Ceph replication. Use Proxmox Backup Server replication for cross-site data protection.
No stretched L2 between sites
Kubernetes node networks and pod/service CIDRs stay within the site-local /16 supernet. Never bridge L2 across the WireGuard tunnel.
Do not reuse infrastructure subnets as pod/service CIDRs
Pod CIDRs and service CIDRs must not overlap with 10.10.0.0/16, 10.20.0.0/16, or 10.255.0.0/24. These are real routed ranges. Assign pod/service CIDRs from separate, non-overlapping blocks.
Cluster Topology
The plan supports up to three OpenShift clusters per site, each with a dedicated /22 machine network carved from the Lab / Trusted Client space. Site A (10.10.0.0/16) and Site B (10.20.0.0/16) maintain independent clusters with no shared control plane or storage.
Using a single flat /16 as a machine network for all nodes is explicitly avoided: it increases ARP/broadcast noise, weakens security zone boundaries between workloads, and makes troubleshooting harder. Each cluster uses a dedicated routed /22; the /16 serves only as a site-level summary prefix for routing.
Build sequence
Do not deploy Kubernetes until the network foundation (OPNsense, VLANs 20/40/50) and Site B Ceph are stable. Deploying clusters on an unstable network or storage layer produces failures that mask underlying infrastructure problems. See Build Phases.
Node Placement (Site B)
Site B hosts all Kubernetes and OpenShift workloads. The five Site B compute nodes (sb-cmp-01–sb-cmp-05) split between control-plane and worker roles alongside their Ceph duties.
| Host | Hardware | K8s Role | Ceph Role |
|---|---|---|---|
sb-cmp-01 | SYS-5019D-4C-FN8TP | Control-plane | MON / MGR |
sb-cmp-02 | SYS-5019D-4C-FN8TP | Control-plane | MON / MGR |
sb-cmp-03 | SYS-5018D-FN4T | Worker | OSD |
sb-cmp-04 | SYS-5018D-FN4T | Worker | OSD |
sb-cmp-05 | SYS-5018D-FN4T | Worker | OSD |
sb-edge-01 runs sb-fw-01 (OPNsense VM) and lightweight infrastructure only. It must not run Kubernetes workloads — see Architecture Overview for E200 workload guidance.
VM sizing not specified
Specific vCPU and RAM allocations for Kubernetes node VMs have not been defined in the planning docs. Finalize sizing after Ceph is stable and per-node headroom is measured.
Networking Architecture
Kubernetes networking splits into two independent layers: the real infrastructure network (VLANs, OPNsense-routed) and the CNI overlay (pod-to-pod and service routing inside the cluster). These must not overlap.
K8s-Relevant VLANs
| VLAN | Name | Use in Kubernetes |
|---|---|---|
| 40 | Kubernetes Nodes | Node/machine addresses for all K8s VMs (10.x0.40.0/22) |
| 50 | K8s LB / VIPs | API VIP, ingress VIP, MetalLB pools (10.x0.50.0/24) |
| 80 | Monitoring | Metrics and alerting traffic (Prometheus, Grafana) |
| 90 | Backup / Replication | Cluster-backup traffic |
| 100 | Lab / Trusted Client | OpenShift machine networks carved from 10.x0.100.0/22+ |
VLAN 40 (10.x0.40.0/22) follows the host-octet convention: 10.x0.40.<host-octet>. For example, sb-cmp-01 (host octet 20) gets 10.20.40.20.
Service VIPs (VLAN 50)
| VIP | Site A | Site B |
|---|---|---|
| Kubernetes API server | 10.10.50.10 | 10.20.50.10 |
| Ingress (HTTP/S) | 10.10.50.11 | 10.20.50.11 |
| MetalLB pool | 10.10.50.200–10.10.50.250 | 10.20.50.200–10.20.50.250 |
Machine Network Allocations
Each OpenShift cluster gets a dedicated /22 machine network from the Lab / Trusted Client space. Node VMs receive addresses from these blocks, separate from the VLAN 40 node addresses used by the underlying host network.
Site A
| Slot | Network | Purpose |
|---|---|---|
| Cluster 1 | 10.10.100.0/22 | OpenShift cluster 1 machine network |
| Cluster 2 | 10.10.104.0/22 | OpenShift cluster 2 machine network |
| Cluster 3 | 10.10.108.0/22 | OpenShift cluster 3 machine network |
| Expansion | 10.10.112.0/21 | Additional clusters or large lab expansion |
Site B
| Slot | Network | Purpose |
|---|---|---|
| Cluster 1 | 10.20.100.0/22 | OpenShift cluster 1 machine network |
| Cluster 2 | 10.20.104.0/22 | OpenShift cluster 2 machine network |
| Cluster 3 | 10.20.108.0/22 | OpenShift cluster 3 machine network |
| Expansion | 10.20.112.0/21 | Additional clusters or large lab expansion |
Tentative — future space `10.x0.128.0/17`
The planning vault reserves 10.10.128.0/17 (Site A) and 10.20.128.0/17 (Site B) as future lab and OpenShift expansion. These ranges carry no current assignments and are not confirmed in the core architecture docs.
Pod and Service CIDRs
Pod and service CIDRs must not overlap with any real LAN, VLAN subnet, or WireGuard range at either site. The examples below are non-overlapping blocks from outside the 10.10.0.0/16 and 10.20.0.0/16 supernets.
| CIDR type | Example range | Note |
|---|---|---|
| Pod CIDR — cluster 1 | 10.128.0.0/14 | Clear of all site subnets; one block per cluster |
| Pod CIDR — cluster 2 | 10.132.0.0/14 | Offset by /14 per cluster to avoid overlap |
| Service CIDR | 172.30.0.0/16 | Stay within 172.16.0.0/12; do not use 172.32.x.x (not RFC 1918) |
Examples only — not assigned
The CIDRs above are planning examples. Confirm actual cluster CIDRs against all current and planned site subnets before deployment.
Stack Components
The planned component stack covers CNI, load balancing, certificates, external DNS, GitOps delivery, and HTTP/S ingress. All components are tentative until the network and Ceph foundations are complete.
| Component | Role |
|---|---|
| Cilium | Preferred CNI; overlay mode; pod networking and NetworkPolicy enforcement |
| MetalLB | LoadBalancer VIP allocation from VLAN 50 (10.x0.50.200–10.x0.50.250) |
| cert-manager | Automatic TLS certificates via DNS-01 challenge through Cloudflare |
| external-dns | Syncs Kubernetes service hostnames into core.aorxi.io |
| ArgoCD | GitOps continuous delivery; all cluster state driven from Git |
| ingress-nginx or Traefik | HTTP/S ingress; TLS termination for in-cluster services |
Stack tentative — choices may shift
Final component selection, particularly the ingress controller and whether to run vanilla Kubernetes or OpenShift, may change based on operational experience. The stack above reflects early planning.
Design Decisions
Site-local clusters
Clusters are kept site-local to avoid WAN latency in the control plane and to simplify failure domains. Cross-site workload continuity is achieved by syncing identical workloads at both sites with ArgoCD and by using backup/restore for stateful data.
Why not a single flat /16 per site
A flat /16 for all cluster nodes and services increases ARP/broadcast noise, weakens security zone boundaries between workloads, and complicates network troubleshooting. Dedicated /22 blocks per cluster keep failure domains tight and routing tables clean.
Cilium in overlay mode
Cilium in overlay mode encapsulates pod traffic inside existing infrastructure VLANs. This avoids advertising pod CIDRs through OPNsense and keeps infrastructure routing decoupled from CNI state. Native routing mode would require OPNsense to carry pod routes, coupling firewall config to cluster churn.
Separation of machine network from VLAN 40
VLAN 40 carries the host-network addresses of Kubernetes node VMs (the 10.x0.40.0/22 block). OpenShift machine networks (10.x0.100.0/22+) are a distinct allocation for the cluster installer's machine network concept. The two serve different roles and are not interchangeable.
Related Pages
- Machine Networks — detailed OpenShift machine-network allocations and per-cluster CIDR assignments
- VLAN Reference — full VLAN table with subnets and gateways
- Site B Ceph — local Ceph cluster that backs Kubernetes PersistentVolumes
- Proxmox Clusters —
sb-pveandsa-pvecluster configuration - Build Phases — sequenced build order including the Kubernetes bring-up phase
Kubernetes / OpenShift
Overview of Kubernetes and OpenShift plans for the two-site homelab: node networks on VLAN 40, VIP pools on VLAN 50, planned stack, and machine-network allocations for multiple clusters.
Machine & Cluster Networks
Network allocations for Kubernetes and OpenShift clusters: VLAN 40 node networks, per-cluster /22 machine networks for up to three OpenShift clusters per site, and pod/service CIDR rules.