Splitting the Ledger and the Graph: Why Karma Uses Separate Pipelines for ClickHouse and Graph DB

One Stream, Two Very Different Destinations

Karma treats anything that emits changes — databases, infrastructure tools, metrics, logs, file drops — as a CDC-like source.
All of these sources are normalized into a common event envelope and published to Kafka (events.normalized).

From there, the normalized stream can be consumed by multiple sinks — but ClickHouse and a Graph DB have such different needs that they get separate pipelines.

Why Separate Pipelines?

1. Different Data Models

ClickHouse: flat, append-only tables; perfect for large-scale time-series analytics, baselines, and aggregations.
Graph DB: nodes, edges, and relationship properties; built for traversals, lineage, and dependency analysis.

2. Independent Scaling

The ClickHouse sink might handle millions of events per minute in batch inserts.
The Graph DB sink might process fewer events but do heavier transformations (merging nodes, recalculating edges).
Each can scale up or down without affecting the other.

3. Different Connectors

ClickHouse: Kafka Connect sink, Kafka table engine.
Graph DB: native streaming ingest (Neo4j Kafka Connect, Neptune Streams, JanusGraph with Gremlin).
Keeping them separate avoids coupling unrelated logic.

4. Easier Experimentation

You can evolve your graph schema without touching the main ledger.
You can temporarily disable graph ingestion without losing ClickHouse history.

The Karma Pattern

Real-World Graph DB Use Cases in Karma

Root cause tracing: traverse dependencies to see what led to an anomaly.
Impact analysis: predict what will break if a service fails.
State machine modeling: track transitions and detect unexpected states.
Multi-hop alerts: notify based on cascading effects, not just single metrics.

When to Skip the Graph DB

If your needs are limited to metrics, baselines, and statistical anomaly detection, ClickHouse alone may suffice.
Graph DB makes sense when relationships and path-dependent reasoning are core to the problem.

One Stream, Two Very Different Destinations#

Why Separate Pipelines?#

1. Different Data Models#

2. Independent Scaling#

3. Different Connectors#

4. Easier Experimentation#

The Karma Pattern#