Goal
Take Karma from recording events to spotting surprises.
We’re not aiming for a PhD-level probabilistic model yet — just enough to:
- Learn what “normal” looks like for each tracked process.
- Flag deviations as events (
entropy.deviation
) in real time. - Feed those events into the action loop.
Scope of the MVP
-
Data Source:
ClickHouse table populated by normalized events (via Kafka). -
Target Entities:
Any(entity_id, step)
combination — e.g., supplier response time, file arrival, job completion. -
Metric:
Latency between defined step boundaries (or between event types). -
Output:
- Current entropy score.
- A boolean “out of expectation” flag.
- Context tags for correlation.
Calculating Entropy
We start with bucketed latency distributions:
SELECT
entity_id,
step,
intDiv(latency_ms, 500) * 500 AS latency_bucket,
count() AS bucket_count
FROM events
WHERE event_ts >= now() - INTERVAL 7 DAY
GROUP BY entity_id, step, latency_bucket
Then apply Shannon’s entropy formula in SQL or downstream code:
SELECT
entity_id,
step,
-sum(p * log2(p)) AS entropy
FROM (
SELECT
entity_id,
step,
latency_bucket,
bucket_count / sum(bucket_count) OVER (PARTITION BY entity_id, step) AS p
FROM latency_distribution
) t
GROUP BY entity_id, step
Defining “Surprise”
We don’t want to fire on every fluctuation. Instead:
- Baseline: Median entropy over the last N days.
- Tolerance: Median Absolute Deviation (MAD) or % change threshold.
- Trigger: Current entropy > baseline + tolerance.
Example:
SELECT
entity_id,
step,
current_entropy,
baseline_entropy,
(current_entropy - baseline_entropy) > 0.5 AS is_surprising
FROM ...
Publishing the Event
Once is_surprising = 1
, publish a normalized event to Kafka:
{
"event_type": "entropy.deviation",
"entity_id": "supplier_A",
"step": "quote_response",
"entropy": 2.35,
"baseline_entropy": 1.5,
"tags": {
"priority": "high",
"detected_by": "karma.entropy.v1"
},
"ts": "2025-08-09T19:58:00Z"
}
This flows into:
- The action loop (automatic responses, notifications).
- The same ClickHouse table for historical tracking.
- Any downstream analytics or dashboards.
Why This Works Now
- No new infrastructure: Uses existing ClickHouse + Kafka stack.
- Simple math: Baselines + deviations, no heavy ML yet.
- Extensible: You can swap in more sophisticated models later.
Roadmap After MVP
- Add sequence entropy (order of steps, not just latency).
- Model joint entropy of multiple tags (supplier + product type).
- Incorporate forecasting for expected future entropy.
- Feed deviations into self-healing playbooks.