AORXI Homelab
Platform Services

Kafka & Database Performance

Kafka blue/green cutover patterns, ordered consumer processing strategy, and PostgreSQL observability and transaction tuning guidance.

Design guidance for Kafka blue/green deployments, maintaining message ordering across consumer parallelism, and diagnosing and tuning PostgreSQL performance — including the impact of audit logging on transaction latency.

Kafka Blue/Green Cutover

Separate Clusters

Run blue and green as fully independent Kafka clusters. Do not have green brokers join the existing blue cluster unless the intent is to expand the same cluster — joining merges them rather than creating a true green environment, which makes rollback and isolation harder.

Blue app  ->  Blue Kafka cluster
Green app ->  Green Kafka cluster

Use a separate green cluster when the goal is any of the following:

GoalWhy separate clusters
IsolationNo shared state between environments
Version or config testingGreen can run a different broker version
Safe rollbackRevert to blue without re-migrating topics
Independent cutoverSwitch traffic in one step, not incrementally

Cutover Steps

  1. Mirror or replicate topics from the blue cluster to the green cluster using MirrorMaker 2, Cluster Linking, or an equivalent replication tool.
  2. Start green consumers from replicated offsets or controlled starting points.
  3. Pause or stop blue producers, or dual-write temporarily — only if the dual-write design is carefully controlled.
  4. Switch producers and consumers to green via config change, DNS, or service discovery.
  5. Validate consumer lag, offset positions, and database side effects before declaring cutover complete.

Validate before declaring complete

Check consumer lag, offsets, and downstream DB state explicitly. A cutover that appears complete at the Kafka layer may still have uncommitted DB writes in flight.

Why Not Join Green Brokers to Blue

Joining green brokers to the blue cluster makes them part of the same Kafka cluster. The result is a shared partition assignment and shared consumer group state — not two independent environments. Rolling back then requires migrating topics back to a pure-blue topology rather than simply stopping green consumers.

Ordered Consumer Processing

Partition-per-Key Pattern

Kafka ordering is guaranteed only within a single partition. To parallelize consumer processing while preserving order for each entity:

  • Partition topics by the ordering key (entity ID, account ID, order ID, or equivalent).
  • Assign a consumer group with multiple consumers; each consumer owns one or more partitions.
  • Process each partition strictly serially.
  • Never process two messages for the same entity concurrently if order matters for that entity.
Topic partitions by entity/customer/account/order key
Consumer group with multiple consumers
Each consumer owns one or more partitions
Each partition processed sequentially
DB updates occur in message order per key

Data integrity — ordering

Parallelizing processing inside a single partition — or across messages that share the same ordering key — breaks the ordering guarantee. For consumers that update a database, this causes data corruption: later events overwrite earlier events out of sequence.

Hot Partitions

If a single partition becomes a throughput bottleneck because one key generates disproportionate volume:

  • Reevaluate the keying strategy.
  • Split by a finer-grained ordering key if application correctness allows the finer granularity.
  • Do not blindly add parallelism inside the hot partition — that breaks ordering for that key.

Database Performance

Observability

Establish visibility before tuning. For PostgreSQL-style databases, instrument the following:

SignalTool / mechanism
Slow queriespg_stat_statements; slow query log
Lock waits and deadlockspg_locks, pg_stat_activity
Transaction durationpg_stat_activity (state, query_start)
Connection pool saturationPool metrics (PgBouncer stats or application pool)
Index usage and sequential scanspg_stat_user_indexes, pg_stat_user_tables
WAL pressure and commit latencypg_stat_bgwriter, WAL metrics

Use EXPLAIN (ANALYZE, BUFFERS) on individual slow queries to confirm actual row estimates, buffer hits, and sequential vs. index scan selection.

Transaction Tuning

ApproachNotes
Keep transactions shortReduces lock hold time and contention window
Batch writes where safeFewer round-trips; check idempotency requirements
Index update and select predicatesEliminate sequential scans on high-volume tables
Avoid unnecessary read-before-writeReduces round-trips and lock escalation risk
Right-size the connection poolToo few = queuing; too many = context switch overhead
Avoid huge transactionsLong-running transactions block autovacuum and hold locks
Idempotency keysSupport exactly-once-ish processing from Kafka consumers

pg_audit Concern

pg_audit is a plausible contributor to transaction latency when audit logging volume is high, but requires measurement to confirm before making changes.

Measure before disabling or reducing auditing

The impact of pg_audit on latency is workload-dependent. Disabling or reducing auditing may have regulatory or security implications. Confirm policy constraints before changing audit scope.

Approaches to investigate in a staging environment:

  • Compare end-to-end transaction latency with auditing enabled versus disabled.
  • Audit only the required roles, databases, or statement types if policy permits narrower scope.
  • Reduce verbose auditing for high-volume application roles if compliant with the applicable policy.
  • Check whether audit log output is creating disk I/O, WAL, or logging pipeline bottlenecks rather than in-transaction CPU overhead.

On this page