AORXI Homelab
Kubernetes / OpenShift

Kubernetes Planning

Planned Kubernetes and OpenShift deployment on Proxmox VMs across both sites: cluster topology, node placement, machine-network allocations, pod/service CIDR design, and the component stack.

Kubernetes and OpenShift clusters are planned for both sites as Proxmox VMs, backed by Site B Ceph for block storage; no cluster has been deployed as of 2026-06-28. Planning covers cluster topology, control-plane and worker placement, VLAN and machine-network allocations, pod/service CIDR design, and the component stack.

Architecture Constraints

These rules are fixed. They must not be violated when planning or deploying clusters.

No cross-site Kubernetes clusters

Keep clusters site-local. Do not stretch a Kubernetes or OpenShift cluster across the WireGuard VPN. Cross-site continuity uses application-level replication, GitOps sync (ArgoCD), and disaster-recovery procedures — not cluster federation.

No stretched Ceph for Kubernetes storage

Site B Ceph is local-only. Do not create PersistentVolumes that require cross-site Ceph replication. Use Proxmox Backup Server replication for cross-site data protection.

No stretched L2 between sites

Kubernetes node networks and pod/service CIDRs stay within the site-local /16 supernet. Never bridge L2 across the WireGuard tunnel.

Do not reuse infrastructure subnets as pod/service CIDRs

Pod CIDRs and service CIDRs must not overlap with 10.10.0.0/16, 10.20.0.0/16, or 10.255.0.0/24. These are real routed ranges. Assign pod/service CIDRs from separate, non-overlapping blocks.

Cluster Topology

The plan supports up to three OpenShift clusters per site, each with a dedicated /22 machine network carved from the Lab / Trusted Client space. Site A (10.10.0.0/16) and Site B (10.20.0.0/16) maintain independent clusters with no shared control plane or storage.

Using a single flat /16 as a machine network for all nodes is explicitly avoided: it increases ARP/broadcast noise, weakens security zone boundaries between workloads, and makes troubleshooting harder. Each cluster uses a dedicated routed /22; the /16 serves only as a site-level summary prefix for routing.

Build sequence

Do not deploy Kubernetes until the network foundation (OPNsense, VLANs 20/40/50) and Site B Ceph are stable. Deploying clusters on an unstable network or storage layer produces failures that mask underlying infrastructure problems. See Build Phases.

Node Placement (Site B)

Site B hosts all Kubernetes and OpenShift workloads. The five Site B compute nodes (sb-cmp-01sb-cmp-05) split between control-plane and worker roles alongside their Ceph duties.

HostHardwareK8s RoleCeph Role
sb-cmp-01SYS-5019D-4C-FN8TPControl-planeMON / MGR
sb-cmp-02SYS-5019D-4C-FN8TPControl-planeMON / MGR
sb-cmp-03SYS-5018D-FN4TWorkerOSD
sb-cmp-04SYS-5018D-FN4TWorkerOSD
sb-cmp-05SYS-5018D-FN4TWorkerOSD

sb-edge-01 runs sb-fw-01 (OPNsense VM) and lightweight infrastructure only. It must not run Kubernetes workloads — see Architecture Overview for E200 workload guidance.

VM sizing not specified

Specific vCPU and RAM allocations for Kubernetes node VMs have not been defined in the planning docs. Finalize sizing after Ceph is stable and per-node headroom is measured.

Networking Architecture

Kubernetes networking splits into two independent layers: the real infrastructure network (VLANs, OPNsense-routed) and the CNI overlay (pod-to-pod and service routing inside the cluster). These must not overlap.

K8s-Relevant VLANs

VLANNameUse in Kubernetes
40Kubernetes NodesNode/machine addresses for all K8s VMs (10.x0.40.0/22)
50K8s LB / VIPsAPI VIP, ingress VIP, MetalLB pools (10.x0.50.0/24)
80MonitoringMetrics and alerting traffic (Prometheus, Grafana)
90Backup / ReplicationCluster-backup traffic
100Lab / Trusted ClientOpenShift machine networks carved from 10.x0.100.0/22+

VLAN 40 (10.x0.40.0/22) follows the host-octet convention: 10.x0.40.<host-octet>. For example, sb-cmp-01 (host octet 20) gets 10.20.40.20.

Service VIPs (VLAN 50)

VIPSite ASite B
Kubernetes API server10.10.50.1010.20.50.10
Ingress (HTTP/S)10.10.50.1110.20.50.11
MetalLB pool10.10.50.200–10.10.50.25010.20.50.200–10.20.50.250

Machine Network Allocations

Each OpenShift cluster gets a dedicated /22 machine network from the Lab / Trusted Client space. Node VMs receive addresses from these blocks, separate from the VLAN 40 node addresses used by the underlying host network.

Site A

SlotNetworkPurpose
Cluster 110.10.100.0/22OpenShift cluster 1 machine network
Cluster 210.10.104.0/22OpenShift cluster 2 machine network
Cluster 310.10.108.0/22OpenShift cluster 3 machine network
Expansion10.10.112.0/21Additional clusters or large lab expansion

Site B

SlotNetworkPurpose
Cluster 110.20.100.0/22OpenShift cluster 1 machine network
Cluster 210.20.104.0/22OpenShift cluster 2 machine network
Cluster 310.20.108.0/22OpenShift cluster 3 machine network
Expansion10.20.112.0/21Additional clusters or large lab expansion

Tentative — future space `10.x0.128.0/17`

The planning vault reserves 10.10.128.0/17 (Site A) and 10.20.128.0/17 (Site B) as future lab and OpenShift expansion. These ranges carry no current assignments and are not confirmed in the core architecture docs.

Pod and Service CIDRs

Pod and service CIDRs must not overlap with any real LAN, VLAN subnet, or WireGuard range at either site. The examples below are non-overlapping blocks from outside the 10.10.0.0/16 and 10.20.0.0/16 supernets.

CIDR typeExample rangeNote
Pod CIDR — cluster 110.128.0.0/14Clear of all site subnets; one block per cluster
Pod CIDR — cluster 210.132.0.0/14Offset by /14 per cluster to avoid overlap
Service CIDR172.30.0.0/16Stay within 172.16.0.0/12; do not use 172.32.x.x (not RFC 1918)

Examples only — not assigned

The CIDRs above are planning examples. Confirm actual cluster CIDRs against all current and planned site subnets before deployment.

Stack Components

The planned component stack covers CNI, load balancing, certificates, external DNS, GitOps delivery, and HTTP/S ingress. All components are tentative until the network and Ceph foundations are complete.

ComponentRole
CiliumPreferred CNI; overlay mode; pod networking and NetworkPolicy enforcement
MetalLBLoadBalancer VIP allocation from VLAN 50 (10.x0.50.200–10.x0.50.250)
cert-managerAutomatic TLS certificates via DNS-01 challenge through Cloudflare
external-dnsSyncs Kubernetes service hostnames into core.aorxi.io
ArgoCDGitOps continuous delivery; all cluster state driven from Git
ingress-nginx or TraefikHTTP/S ingress; TLS termination for in-cluster services

Stack tentative — choices may shift

Final component selection, particularly the ingress controller and whether to run vanilla Kubernetes or OpenShift, may change based on operational experience. The stack above reflects early planning.

Design Decisions

Site-local clusters

Clusters are kept site-local to avoid WAN latency in the control plane and to simplify failure domains. Cross-site workload continuity is achieved by syncing identical workloads at both sites with ArgoCD and by using backup/restore for stateful data.

Why not a single flat /16 per site

A flat /16 for all cluster nodes and services increases ARP/broadcast noise, weakens security zone boundaries between workloads, and complicates network troubleshooting. Dedicated /22 blocks per cluster keep failure domains tight and routing tables clean.

Cilium in overlay mode

Cilium in overlay mode encapsulates pod traffic inside existing infrastructure VLANs. This avoids advertising pod CIDRs through OPNsense and keeps infrastructure routing decoupled from CNI state. Native routing mode would require OPNsense to carry pod routes, coupling firewall config to cluster churn.

Separation of machine network from VLAN 40

VLAN 40 carries the host-network addresses of Kubernetes node VMs (the 10.x0.40.0/22 block). OpenShift machine networks (10.x0.100.0/22+) are a distinct allocation for the cluster installer's machine network concept. The two serve different roles and are not interchangeable.

  • Machine Networks — detailed OpenShift machine-network allocations and per-cluster CIDR assignments
  • VLAN Reference — full VLAN table with subnets and gateways
  • Site B Ceph — local Ceph cluster that backs Kubernetes PersistentVolumes
  • Proxmox Clusterssb-pve and sa-pve cluster configuration
  • Build Phases — sequenced build order including the Kubernetes bring-up phase

On this page