Recommended architecture for “income/outcome ops + big reports”
1) OLTP is the source of truth (writes)
Your service consumes Kafka events (income/outcome) and writes them to an OLTP database:
- Postgres (or similar)
- Proper constraints, FK, idempotency keys, transactions
- Optimized for correctness + small/fast queries
This DB is where you can answer:
- “What is the current balance?”
- “Show last 50 transactions”
- “Did we already process this Kafka message?”
2) OLAP is for heavy reports (reads)
Big reports (aggregations over long periods, many dimensions) should read from an OLAP store:
- ClickHouse / BigQuery / Redshift / Snowflake (depending on your stack)
- Denormalized model, partitions, pre-aggregations
- Handles huge scans + GROUP BY efficiently
3) Data moves from OLTP → OLAP automatically
You have 2 common patterns:
✅ Pattern A: CDC (best in many cases)
- Debezium reads Postgres WAL (change data capture)
- Publishes changes to Kafka
- OLAP ingestor consumes and loads to ClickHouse/warehouse
Benefits:
- App writes only OLTP
- Strong audit trail
- Near real-time analytics
✅ Pattern B: Dual consume from Kafka (also common)
- Your OLTP writer service consumes Kafka → writes Postgres
- Separately, an analytics pipeline consumes Kafka → writes OLAP
Benefits:
- No CDC complexity
- Analytics can evolve independently
Tradeoff:
- You must handle duplicates/out-of-order in both sinks
Should it be “one DB” to avoid writing twice?
One DB is great until reports get heavy. Then it becomes a bottleneck.
If you keep everything in Postgres:
You’ll eventually hit:
- long-running aggregation queries blocking resources
- cache eviction
- I/O pressure from full scans
- pressure to add “analytics indexes” that slow writes
So the principle is:
One write path (OLTP), many read models (OLAP, caches, materialized views).
This is basically CQRS-ish: separate write model from read model.
Practical “staged” approach (what I’d do in a real team)
Stage 1 (MVP / moderate data)
- Postgres only
- Keep raw operations table
- Use:
- indexes on
(client_id, time) - maybe a few materialized views / summary tables updated incrementally
- indexes on
Good if reports are not insane.
Stage 2 (reports grow / data grows)
Add OLAP:
- Keep OLTP as truth
- Stream data into ClickHouse (or your OLAP)
- Dashboards/reports query OLAP only
Key engineering requirements (Kafka → DB)
Idempotency (must-have)
Kafka is at-least-once in practice. You will see duplicates.
Use:
event_id(unique) withUNIQUE(event_id)- or
(client_id, external_tx_id)unique constraint
Write pattern:
- insert transaction row (or upsert)
- update aggregates if insert succeeded
Ordering / time
Events can arrive late/out of order:
- store event time separately from processing time
- OLAP reports should use event time with late-arrival handling
Exactly-once? Don’t chase it blindly
For most systems:
- at-least-once + idempotent writes is the winning combo.
A concrete blueprint (clean)
- Kafka topic:
transactions - Consumer service
ledger-writer:- validates, deduplicates (
event_id) - writes to Postgres
ledger_transactions - (optional) updates
balancestable
- validates, deduplicates (
- Analytics pipeline:
- either CDC from Postgres
- or another Kafka consumer
- loads into OLAP fact table
fact_transactions
- Reports service queries OLAP
Interview-ready answer
“I wouldn’t make the app write to both OLTP and OLAP as sources of truth. I’d write once to OLTP and replicate to OLAP using CDC or a separate Kafka consumer. OLTP stays correct and transactional; OLAP is optimized for large aggregations and reports.”