Kafka & Database Performance
Kafka blue/green cutover patterns, ordered consumer processing strategy, and PostgreSQL observability and transaction tuning guidance.
Design guidance for Kafka blue/green deployments, maintaining message ordering across consumer parallelism, and diagnosing and tuning PostgreSQL performance — including the impact of audit logging on transaction latency.
Kafka Blue/Green Cutover
Separate Clusters
Run blue and green as fully independent Kafka clusters. Do not have green brokers join the existing blue cluster unless the intent is to expand the same cluster — joining merges them rather than creating a true green environment, which makes rollback and isolation harder.
Blue app -> Blue Kafka cluster
Green app -> Green Kafka clusterUse a separate green cluster when the goal is any of the following:
| Goal | Why separate clusters |
|---|---|
| Isolation | No shared state between environments |
| Version or config testing | Green can run a different broker version |
| Safe rollback | Revert to blue without re-migrating topics |
| Independent cutover | Switch traffic in one step, not incrementally |
Cutover Steps
- Mirror or replicate topics from the blue cluster to the green cluster using MirrorMaker 2, Cluster Linking, or an equivalent replication tool.
- Start green consumers from replicated offsets or controlled starting points.
- Pause or stop blue producers, or dual-write temporarily — only if the dual-write design is carefully controlled.
- Switch producers and consumers to green via config change, DNS, or service discovery.
- Validate consumer lag, offset positions, and database side effects before declaring cutover complete.
Validate before declaring complete
Check consumer lag, offsets, and downstream DB state explicitly. A cutover that appears complete at the Kafka layer may still have uncommitted DB writes in flight.
Why Not Join Green Brokers to Blue
Joining green brokers to the blue cluster makes them part of the same Kafka cluster. The result is a shared partition assignment and shared consumer group state — not two independent environments. Rolling back then requires migrating topics back to a pure-blue topology rather than simply stopping green consumers.
Ordered Consumer Processing
Partition-per-Key Pattern
Kafka ordering is guaranteed only within a single partition. To parallelize consumer processing while preserving order for each entity:
- Partition topics by the ordering key (entity ID, account ID, order ID, or equivalent).
- Assign a consumer group with multiple consumers; each consumer owns one or more partitions.
- Process each partition strictly serially.
- Never process two messages for the same entity concurrently if order matters for that entity.
Topic partitions by entity/customer/account/order key
Consumer group with multiple consumers
Each consumer owns one or more partitions
Each partition processed sequentially
DB updates occur in message order per keyData integrity — ordering
Parallelizing processing inside a single partition — or across messages that share the same ordering key — breaks the ordering guarantee. For consumers that update a database, this causes data corruption: later events overwrite earlier events out of sequence.
Hot Partitions
If a single partition becomes a throughput bottleneck because one key generates disproportionate volume:
- Reevaluate the keying strategy.
- Split by a finer-grained ordering key if application correctness allows the finer granularity.
- Do not blindly add parallelism inside the hot partition — that breaks ordering for that key.
Database Performance
Observability
Establish visibility before tuning. For PostgreSQL-style databases, instrument the following:
| Signal | Tool / mechanism |
|---|---|
| Slow queries | pg_stat_statements; slow query log |
| Lock waits and deadlocks | pg_locks, pg_stat_activity |
| Transaction duration | pg_stat_activity (state, query_start) |
| Connection pool saturation | Pool metrics (PgBouncer stats or application pool) |
| Index usage and sequential scans | pg_stat_user_indexes, pg_stat_user_tables |
| WAL pressure and commit latency | pg_stat_bgwriter, WAL metrics |
Use EXPLAIN (ANALYZE, BUFFERS) on individual slow queries to confirm actual row estimates, buffer hits, and sequential vs. index scan selection.
Transaction Tuning
| Approach | Notes |
|---|---|
| Keep transactions short | Reduces lock hold time and contention window |
| Batch writes where safe | Fewer round-trips; check idempotency requirements |
| Index update and select predicates | Eliminate sequential scans on high-volume tables |
| Avoid unnecessary read-before-write | Reduces round-trips and lock escalation risk |
| Right-size the connection pool | Too few = queuing; too many = context switch overhead |
| Avoid huge transactions | Long-running transactions block autovacuum and hold locks |
| Idempotency keys | Support exactly-once-ish processing from Kafka consumers |
pg_audit Concern
pg_audit is a plausible contributor to transaction latency when audit logging volume is high, but requires measurement to confirm before making changes.
Measure before disabling or reducing auditing
The impact of pg_audit on latency is workload-dependent. Disabling or reducing auditing may have regulatory or security implications. Confirm policy constraints before changing audit scope.
Approaches to investigate in a staging environment:
- Compare end-to-end transaction latency with auditing enabled versus disabled.
- Audit only the required roles, databases, or statement types if policy permits narrower scope.
- Reduce verbose auditing for high-volume application roles if compliant with the applicable policy.
- Check whether audit log output is creating disk I/O, WAL, or logging pipeline bottlenecks rather than in-transaction CPU overhead.
Related Pages
- Proxmox Clusters — compute and storage layer running these platform services
- Kubernetes Planning — K8s workloads that will host Kafka and database services
- Container Infra — container build and registry infrastructure
UniFi OS Server (UOS) Controller
One self-hosted UniFi controller for the whole lab: sa-uos-01 on VLAN 10, pinned UOS 5.1.19, Pulumi-provisioned with API-driven first-run setup.
Container & Build Infrastructure
Docker Compose setup for running Supermicro Java iKVM on ARM/macOS hosts: build approach, known errors, and SMB file sharing for IPMI virtual media.