The meta-dashboard that watches the health-score formula itself. Below: a 5-minute walkthrough so you can use it today, then concrete decision rules and per-tab references.
Five concrete steps. Each takes ~30 seconds. Run them once and you'll know what every tab is for and how to act on what you see.
/health-lab. Look at the 4 KPI cards (AUC churn, AUC trial conversion, monotonicity, coverage).npm run audit:reliability -- --only A.12. Then refresh /health-lab/insights.The Lab has 7 pages. After the walkthrough above, this table is the cheatsheet.
| Tab | Answers | When to open |
|---|---|---|
| Pulsewarming up | Is the formula in good shape today? | Mondays. 30 seconds to know if the formula has drifted vs the prior month. |
| Predictive Power | Does the score actually predict the outcome? | When AUC moves and you need to know which range of scores is mis-predicting. |
| Pillar Lenswarming up | Which of the 5 pillars carries the signal? | When you suspect a pillar is dead weight, or to find emerging hero/counter features (recreates the archived analysis live). |
| Cohort Lenswarming up | Do thresholds work across all segments? | To spot non-monotone segments (e.g. healthy sandboxes that churn anyway, at-risk multi-seat that don't). |
| What-If Simulator ⭐ | What would happen if we changed the formula? | Any time you want to test a hypothesis. Replaces the manual SQL recalibration runbook. |
| Auto-Insights | What did the system notice on its own? | Mondays after the weekly audit (A.12) runs. Triage what needs action. |
| Versions | What formulas have we tried, and how did they perform? | When promoting a candidate to active, or auditing why we changed weights last quarter. |
Is the formula in good shape today?
What you see: 4 KPIs (AUC churn, AUC conversion, bucket monotonicity, outcome coverage) + 12-week AUC trend chart.
When to use: Mondays. 30 seconds to know if the formula has drifted vs the prior month.
Open Pulse →Does the score actually predict the outcome?
What you see: Decile table (10 score buckets × outcome rate, should be monotone), calibration scatter (predicted vs observed), confusion matrix at the at-risk threshold (< 25).
When to use: When AUC moves and you need to know which range of scores is mis-predicting.
Open Predictive Power →Which of the 5 pillars carries the signal?
What you see: Per-pillar Pearson correlation + permutation importance (AUC drop when this pillar is randomized), hero features (positive lift), counter-signals (negative lift).
When to use: When you suspect a pillar is dead weight, or to find emerging hero/counter features (recreates the archived analysis live).
Open Pillar Lens →Do thresholds work across all segments?
What you see: Heatmap bucket × segment (sandbox / trial / sub_small / sub_mid / sub_large) with outcome rate per cell.
When to use: To spot non-monotone segments (e.g. healthy sandboxes that churn anyway, at-risk multi-seat that don't).
Open Cohort Lens →What would happen if we changed the formula?
What you see: Sliders for the 5 pillar weights, saturation thresholds, core CRM feature set. After Run: AUC canonical vs candidate, distribution shift, top movers (locations whose score changes most).
When to use: Any time you want to test a hypothesis. Replaces the manual SQL recalibration runbook.
Open What-If Simulator ⭐ →What did the system notice on its own?
What you see: Reverse-chronological feed of drifts, new signals, counter-signals, segment breaks, anomalies. Severity (info / warn / critical) + status (open / acked / dismissed / actioned).
When to use: Mondays after the weekly audit (A.12) runs. Triage what needs action.
Open Auto-Insights →What formulas have we tried, and how did they perform?
What you see: Timeline of all formulas (active / candidate / shadow / retired) with their latest AUC + the JSON of their parameters.
When to use: When promoting a candidate to active, or auditing why we changed weights last quarter.
Open Versions →The 60-day outcome maturity window explains why Pulse is 'warming up'.
| When | What becomes available |
|---|---|
| Today | v1 legacy has 42 AUC rows (different formula, kept as historical proxy). v3 has 0 AUC because it shipped 2026-04-20 and outcomes mature 60 days later. |
| ~2026-05-05 (J+14) | First v3 score_day with mature 60d outcome (2026-03-06 + 60). AUC starts being non-null but very noisy (N≈1 day). |
| ~2026-06-04 (J+44) | 30 days of v3 outcomes — first stable AUC measurement. Pulse trend chart starts being meaningful. |
| ~2026-07-04 (J+74) | 60 days of mature v3 outcomes — full rolling window. AUC becomes robust. |
Want signal sooner? Use horizon=30 + outcome=conversion on the trial cohort — trial conversion settles in 14-30 days, much faster than churn.
AUC, calibration, lift, monotonicity… in plain language.
What runs nightly to keep the Lab fresh.
| Job | Schedule | Role |
|---|---|---|
| aggregate_feature_30d_0345 | 03:45 Paris daily | Refresh feature_daily_30d_agg + prune > 30d (perf cache for the simulator) |
| health_outcomes_backfill_0400 | 04:00 Paris daily | Compute outcomes for score_day = today-60 + today-90, then aggregate distribution_daily |
| health_formula_performance_0430 | 04:30 Paris daily | AUC + correlation for the active formula × cohorts × outcomes |
| health_pillar_performance_0445 | 04:45 Paris daily | Per-pillar Pearson + permutation importance |
| health_feature_lift_weekly_0500 | Monday 05:00 Paris | Per-feature lift (hero / counter signals) |
| A.12 health-formula-drift | Audit framework (reliability lane, weekly) | Detect AUC drift > 5% / 10%, broken monotonicity, emerging hero/counter features → write into health_insights_log |
Trigger A.12 manually: npm run audit:reliability -- --only A.12
Why both columns exist on location_health_daily.
formula_id is the new source of truth (catalog in health_formula_versions). score_version is kept in parallel for back-compat (173 existing consumers). Dropping score_version is planned in a follow-up migration once prod has been stable on formula_id for ≥7 days.
Backfill mapping (M106):
Cause → meaning → action. Print this if you don't trust your memory.
| If you see | It means | Do this |
|---|---|---|
| AUC < 0.6 | Formula barely better than random. | Open Pillar Lens — at least one pillar likely has corr ~0. Test removing or de-weighting it in the Simulator. |
| AUC 0.6 – 0.7 | OK predictor with room to improve. | Try 2-3 candidates in the Simulator. Save the best (Δ AUC ≥ +0.02) as a candidate. |
| AUC ≥ 0.7 | Strong predictor — don't touch. | Watch Insights weekly. Don't promote candidates unless they're +0.03 better. |
| Δ AUC ≥ +0.03 in Simulator | Real improvement on the backtest. | Click Save as candidate. Wait the 30-day shadow period. If still better in 30 days → promote via migration. |
| Δ AUC < +0.01 | Within noise floor. Could be coincidence. | Don't save. Either try a bigger change (move a weight by ≥ 5) or accept the formula as-is. |
| Monotonicity = Broken on Pulse | Buckets aren't ordered by outcome rate (e.g. healthy churns more than steady). | Open Cohort Lens — find the segment that breaks the order. Often sandbox (always exclude) or sub_large (different dynamics that need their own threshold). |
| Counter-signal in Insights with n ≥ 30, p < 0.1 | Users of that feature do WORSE than non-users on the outcome. | Open Simulator. Untoggle that feature from Core CRM. Re-run. If AUC goes up → save the candidate. |
| Candidate in Versions with shadow < 30 days | Not promotable yet. | Wait until min_shadow_days elapses (default 30). Watch its AUC on Versions until it stabilises ≥ active formula. |
Common 'no data' cases and how to unblock them.
After v3 shipped (2026-04-20), the recalibration loop was manual: a runbook to re-tune thresholds at J+60, a future logistic regression at J+180. Each step lost momentum. The Lab automates the measure → explain → iterate cycle so we never have to ask 'is the formula still working?' — the answer is on /health-lab Pulse, fresh every morning.
No. Save as candidate inserts a row with status=candidate and a 30-day shadow period. Promotion to status=active still requires a manual migration (intentional gate — formula activation has CSM and CEO impact).
Exception: v3 was promoted directly without the Lab existing. All future candidates inherit the default 30-day shadow (and you can override per candidate).
Mark it dismissed on /health-lab/insights. The thresholds in A.12 (drift > 5%, lift > 4pt counter, > 6pt hero) can be tuned in scripts/audit/checks/a12-health-formula-drift.ts.
Not implemented (decision Q6 of review). The /health-lab/insights feed is the source of truth — read it Mondays.
/summary-health is the operational dashboard (what's the state of each location TODAY). /health-lab is the meta-dashboard (is the FORMULA itself doing its job?). Same data, different question.
No. Adding a pillar changes the formula structure (which is code-defined). Only parameters (weights, saturation, thresholds, core_crm set) are data-driven. New pillars require a migration + code review.
DELETE FROM health_formula_versions WHERE formula_id = '<the_id>' AND status = 'candidate'. The catalog is small enough that direct SQL is fine.