First Things to Do After Capturing MongoDB Change Streams in ClickHouse

Once your MongoDB change streams are flowing through Kafka and landing in ClickHouse, you’ve got a live, queryable event history for every state change in your systems.

The obvious next step: start using it immediately — even before you build full-blown dashboards or machine learning models.

1. Detect Missing or Late Events

One of the fastest wins is catching when something doesn’t happen. If you know a collection normally sees certain events every day, you can query for absences:

SELECT ns_db, ns_coll, count() AS events
FROM record_raw
WHERE ts >= today()
GROUP BY ns_db, ns_coll
HAVING events = 0;

You can also set up a query to check if an expected supplier or consumer is late:

SELECT JSON_VALUE(tags,'$.consumer_id') AS consumer,
       max(ts) AS last_event
FROM record_raw
WHERE ts >= now() - INTERVAL 2 DAY
GROUP BY consumer
HAVING last_event < now() - INTERVAL 1 HOUR;

2. Monitor Latency Trends

If you store durations as an attribute (e.g. latency_sec), you can plot changes over time:

SELECT toStartOfInterval(ts, INTERVAL 15 MINUTE) AS bucket,
       avg(toFloat64OrZero(JSON_VALUE(attrs,'$.latency_sec'))) AS avg_latency
FROM record_raw
WHERE ts >= now() - INTERVAL 24 HOUR
GROUP BY bucket
ORDER BY bucket;

This is immediately useful for spotting slowdowns before they cause SLA violations.

3. Trace Multi-Step Workflows

Because you kept tags flexible, you can reconstruct sequences of events for a specific job, consumer, or transaction:

SELECT ts, event_type, tags
FROM record_raw
WHERE JSON_VALUE(tags,'$.job_id') = '20250809-abc123'
ORDER BY ts;

This turns ClickHouse into a forensics tool — no extra instrumentation needed.

4. Build Simple SLO Dashboards

Even without Grafana, you can run queries that give you health snapshots:

SELECT event_type,
       count() AS events,
       round(events / (SELECT count() FROM record_raw WHERE ts >= now() - INTERVAL 1 HOUR), 2) AS pct
FROM record_raw
WHERE ts >= now() - INTERVAL 1 HOUR
GROUP BY event_type
ORDER BY pct DESC;

This works for system-wide monitoring until you formalize SLIs/SLOs.

5. Train Your First Baseline Models

With historical data accumulating, you can start with lightweight predictive checks:

ARIMA on event counts to forecast expected volume
Shallow neural nets to flag unusual patterns
Simple rules like “if supplier X’s latency > mean + 3σ, alert”

Because ClickHouse is fast, you can export data for training without disrupting queries.

6. Automate Interventions

Once you have a stream of tagged, timestamped events, you can:

Trigger follow-up jobs if certain sequences are incomplete
Send alerts only when patterns deviate, not on every single event
Replace some manual SRE “eyes-on-glass” work with automated state checks

Bottom line: As soon as CDC data hits ClickHouse, you can monitor, alert, and analyze — before you invest in bigger architecture changes. It’s the shortest path from “we captured the events” to “we’re making better decisions in real time.”

Related Posts

1. Detect Missing or Late Events#

2. Monitor Latency Trends#

3. Trace Multi-Step Workflows#

4. Build Simple SLO Dashboards#

5. Train Your First Baseline Models#

6. Automate Interventions#