Secrets — OpenBao
Two independent OpenBao instances (sa-bao-01, sb-bao-01) on the edge E200s: secret tiers, cross-site transit auto-unseal, AppRole consumers, and the break-glass fallback.
One OpenBao (Vault OSS fork) instance per site, hosted on the edge E200s — a good E200 workload: a single small Go binary in a ~2 GB VM. Decided 2026-07-02. There is deliberately no stretched Raft cluster: OpenBao has no cross-site replication, a 2-node Raft over WireGuard cannot quorum, and the design mirrors the one-cluster-per-site rule.
Topology
| Site A | Site B | |
|---|---|---|
| VM | sa-bao-01 on sa-edge-01 | sb-bao-01 on sb-edge-01 |
| Final address | 10.10.30.40/24 (VLAN 30) | 10.20.30.40/24 (VLAN 30) |
| Specs | 2 vCPU / 2 GB / 20 GB, Ubuntu noble | same |
| Storage | single-node integrated Raft | same |
| Version | 2.5.4 (pinned tarball, sha256-verified) | same |
Cross-site disaster recovery uses scheduled bao operator raft snapshot save, shipped to the other site over the backup path (VLAN 90, PBS pattern) — each site's secrets are restorable at the other.
Seal / Unseal
Each instance auto-unseals via the other site's transit engine over the WireGuard tunnel: sa-bao-01 points at sb-bao-01 and vice versa. Because the tunnel is the unseal path, the OPNsense VM must start before the bao VM on each E200 (Proxmox startup order: OPNsense order=1, bao order=2 plus delay).
Cold-start deadlock — documented break-glass
If both sites are down at once, neither can auto-unseal. Recovery: seal-migrate one instance back to Shamir (temporary seal stanza swap plus recovery keys), unseal it manually, let the other auto-unseal against it, then migrate back. Both sites' recovery keys live in the password manager — this is the documented recovery path, not an improvisation.
Interim — Site A only (2026-07-02)
Site B is non-operational, so sa-bao-01 runs standalone on Shamir manual unseal (3 of 5 keys after the rare VM reboot). This is bootstrap step 1, not throwaway work — the transit auto-unseal steps execute when Site B comes online. The site-b Pulumi stack stays parked at enabled: false.
Secret Tiers
| Tier | Store | Contents |
|---|---|---|
| 0 — bootstrap | repo-root .env.local (gitignored) | Proxmox creds, PULUMI_CONFIG_PASSPHRASE, ANSIBLE_VAULT_PASSWORD — everything needed to (re)build bao itself |
| 1 — runtime | OpenBao KV v2 (per site) | app/service credentials, API keys, cert material |
Tier 0 can never migrate into bao: bao cannot hold the secrets that build bao.
Consumers (bao-first, executed 2026-07-01)
The KV v2 mount is homelab, with paths opnsense/site-a, unifi/site-a, and pulumi/proxmox; Site B mirrors under */site-b on sb-bao-01 later. Access is per-consumer read-only via AppRole (opnsense-config, unifi-config, pulumi-provision), with role_id/secret_id in root .env.local.
- Ansible (
opnsense/config,unifi/config): secret vars are KV v2 lookups that fall back to the encryptedvault-credentials.ymlfiles when bao is unreachable or sealed — the files stay in place as documented break-glass, re-synced viamake openbao-config-seed. - Pulumi: make targets overlay
PROXMOX_VE_*fromhomelab/pulumi/proxmox, silently falling back to.env.localon any failure. Stack encryption stays on the local passphrase provider — a bao-backed provider cannot fall back when bao is sealed, andaorxi-openbaocould never use it anyway. - Unsetting
BAO_ADDRforces pure fallback mode everywhere.
Still Deferred
Transit engines and seal migration (needs Site B), TLS via Let's Encrypt DNS-01 (sa-bao-01.core.aorxi.io; never expose port 8200 beyond the local segment while TLS is off), the Raft snapshot timer with cross-site shipping, and the api_addr correction at the VLAN 30 cutover. Init and recovery-key handling stay a manual runbook.
Related Pages
- Initial Site Bootstrap — where bao provisioning and the bao-first migration sit in the build order
- DNS VMs — the internal zone that will name the bao endpoints
- Decisions Log — dated entries for the 2026-07-01/02 secrets decisions
DNS VMs (Technitium)
Deployment reference for the four Technitium DNS VMs: host placement, IP addresses, zone replication, RFC 2136 dynamic updates, and firewall rules.
UniFi OS Server (UOS) Controller
One self-hosted UniFi controller for the whole lab: sa-uos-01 on VLAN 10, pinned UOS 5.1.19, Pulumi-provisioned with API-driven first-run setup.