Kafka & Database Performance

Kafka blue/green cutover patterns, ordered consumer processing strategy, and PostgreSQL observability and transaction tuning guidance.

Design guidance for Kafka blue/green deployments, maintaining message ordering across consumer parallelism, and diagnosing and tuning PostgreSQL performance — including the impact of audit logging on transaction latency.

Kafka Blue/Green Cutover

Separate Clusters

Run blue and green as fully independent Kafka clusters. Do not have green brokers join the existing blue cluster unless the intent is to expand the same cluster — joining merges them rather than creating a true green environment, which makes rollback and isolation harder.

Blue app  ->  Blue Kafka cluster
Green app ->  Green Kafka cluster

Use a separate green cluster when the goal is any of the following:

Goal	Why separate clusters
Isolation	No shared state between environments
Version or config testing	Green can run a different broker version
Safe rollback	Revert to blue without re-migrating topics
Independent cutover	Switch traffic in one step, not incrementally

Cutover Steps

Mirror or replicate topics from the blue cluster to the green cluster using MirrorMaker 2, Cluster Linking, or an equivalent replication tool.
Start green consumers from replicated offsets or controlled starting points.
Pause or stop blue producers, or dual-write temporarily — only if the dual-write design is carefully controlled.
Switch producers and consumers to green via config change, DNS, or service discovery.
Validate consumer lag, offset positions, and database side effects before declaring cutover complete.

Validate before declaring complete

Check consumer lag, offsets, and downstream DB state explicitly. A cutover that appears complete at the Kafka layer may still have uncommitted DB writes in flight.

Why Not Join Green Brokers to Blue

Joining green brokers to the blue cluster makes them part of the same Kafka cluster. The result is a shared partition assignment and shared consumer group state — not two independent environments. Rolling back then requires migrating topics back to a pure-blue topology rather than simply stopping green consumers.

Ordered Consumer Processing

Partition-per-Key Pattern

Kafka ordering is guaranteed only within a single partition. To parallelize consumer processing while preserving order for each entity:

Partition topics by the ordering key (entity ID, account ID, order ID, or equivalent).
Assign a consumer group with multiple consumers; each consumer owns one or more partitions.
Process each partition strictly serially.
Never process two messages for the same entity concurrently if order matters for that entity.

Topic partitions by entity/customer/account/order key
Consumer group with multiple consumers
Each consumer owns one or more partitions
Each partition processed sequentially
DB updates occur in message order per key

Data integrity — ordering

Parallelizing processing inside a single partition — or across messages that share the same ordering key — breaks the ordering guarantee. For consumers that update a database, this causes data corruption: later events overwrite earlier events out of sequence.

Hot Partitions

If a single partition becomes a throughput bottleneck because one key generates disproportionate volume:

Reevaluate the keying strategy.
Split by a finer-grained ordering key if application correctness allows the finer granularity.
Do not blindly add parallelism inside the hot partition — that breaks ordering for that key.

Database Performance

Observability

Establish visibility before tuning. For PostgreSQL-style databases, instrument the following:

Signal	Tool / mechanism
Slow queries	`pg_stat_statements`; slow query log
Lock waits and deadlocks	`pg_locks`, `pg_stat_activity`
Transaction duration	`pg_stat_activity` (`state`, `query_start`)
Connection pool saturation	Pool metrics (PgBouncer stats or application pool)
Index usage and sequential scans	`pg_stat_user_indexes`, `pg_stat_user_tables`
WAL pressure and commit latency	`pg_stat_bgwriter`, WAL metrics

Use EXPLAIN (ANALYZE, BUFFERS) on individual slow queries to confirm actual row estimates, buffer hits, and sequential vs. index scan selection.

Transaction Tuning

Approach	Notes
Keep transactions short	Reduces lock hold time and contention window
Batch writes where safe	Fewer round-trips; check idempotency requirements
Index update and select predicates	Eliminate sequential scans on high-volume tables
Avoid unnecessary read-before-write	Reduces round-trips and lock escalation risk
Right-size the connection pool	Too few = queuing; too many = context switch overhead
Avoid huge transactions	Long-running transactions block autovacuum and hold locks
Idempotency keys	Support exactly-once-ish processing from Kafka consumers

pg_audit Concern

pg_audit is a plausible contributor to transaction latency when audit logging volume is high, but requires measurement to confirm before making changes.

Measure before disabling or reducing auditing

The impact of pg_audit on latency is workload-dependent. Disabling or reducing auditing may have regulatory or security implications. Confirm policy constraints before changing audit scope.

Approaches to investigate in a staging environment:

Compare end-to-end transaction latency with auditing enabled versus disabled.
Audit only the required roles, databases, or statement types if policy permits narrower scope.
Reduce verbose auditing for high-volume application roles if compliant with the applicable policy.
Check whether audit log output is creating disk I/O, WAL, or logging pipeline bottlenecks rather than in-transaction CPU overhead.

Proxmox Clusters — compute and storage layer running these platform services
Kubernetes Planning — K8s workloads that will host Kafka and database services
Container Infra — container build and registry infrastructure

Kafka & Database Performance

On this page