DocsHealth Lab

Health Lab — how to use it

The meta-dashboard that watches the health-score formula itself. Below: a 5-minute walkthrough so you can use it today, then concrete decision rules and per-tab references.

5-minute first walkthrough

First time opening the Lab? Do this.

Five concrete steps. Each takes ~30 seconds. Run them once and you'll know what every tab is for and how to act on what you see.

  1. 1
    Open Pulse
    Action: Open /health-lab. Look at the 4 KPI cards (AUC churn, AUC trial conversion, monotonicity, coverage).
    What you'll see: Today the AUC values likely show '—' for v3 because outcomes need 60 days to mature (the formula shipped 2026-04-20). That's expected until ~2026-06-04 — the Lab is wired up and waiting for data, not broken. Coverage % shows how many score rows already have a known outcome.
  2. 2
    Check the Versions baseline
    Action: From the Explore section, click 'Versions'.
    What you'll see: v1 (legacy, retired) shows ~42 AUC rows — that's your historical baseline. v3 (active) has no perf data yet but is the formula in use. Any candidate you propose later should beat v1's AUC and aim higher than v3 once it has data.
  3. 3
    Look at signal you have today
    Action: Click 'Predictive'. Set Cohort=trial, Outcome=conversion, Horizon=30. This is the only combo with usable signal right now (trial conversion settles in 14-30 days, much faster than churn).
    What you'll see: A decile table — locations split into 10 score buckets, each with their conversion rate. If the rate column rises monotonically with score, the formula correctly separates good trials from bad. The calibration scatter below shows whether predicted % matches observed %.
  4. 4
    Test a tweak in the Simulator
    Action: Click 'Simulator'. Don't change anything yet — click Run. You'll see Δ AUC = 0 (the formula vs itself). Now drag the 'depth' weight from 20 → 30 (drop another pillar by 10 to keep Σ=100). Click Run again.
    What you'll see: Δ AUC, distribution shift, and top movers (locations whose score changed most). Δ AUC ≥ +0.02 means the change is worth saving as a candidate. Δ AUC near 0 = noise (don't save). Δ AUC < 0 = your tweak made the formula worse.
  5. 5
    Populate the Insights feed
    Action: From terminal: npm run audit:reliability -- --only A.12. Then refresh /health-lab/insights.
    What you'll see: Drift / hero / counter-signal / segment-broken items. If empty after the first run = no surprises in your data — that's good. The audit runs automatically as part of the weekly reliability lane (Mondays).
Reference — every tab

See a question → open the right tab

The Lab has 7 pages. After the walkthrough above, this table is the cheatsheet.

TabAnswersWhen to open
Pulsewarming upIs the formula in good shape today?Mondays. 30 seconds to know if the formula has drifted vs the prior month.
Predictive PowerDoes the score actually predict the outcome?When AUC moves and you need to know which range of scores is mis-predicting.
Pillar Lenswarming upWhich of the 5 pillars carries the signal?When you suspect a pillar is dead weight, or to find emerging hero/counter features (recreates the archived analysis live).
Cohort Lenswarming upDo thresholds work across all segments?To spot non-monotone segments (e.g. healthy sandboxes that churn anyway, at-risk multi-seat that don't).
What-If Simulator ⭐What would happen if we changed the formula?Any time you want to test a hypothesis. Replaces the manual SQL recalibration runbook.
Auto-InsightsWhat did the system notice on its own?Mondays after the weekly audit (A.12) runs. Triage what needs action.
VersionsWhat formulas have we tried, and how did they perform?When promoting a candidate to active, or auditing why we changed weights last quarter.
Heads-up: the v3 formula shipped 2026-04-20. Outcomes need 60 days to materialise → first stable AUC ~2026-06-04 (see the timeline below). v1 legacy already has 42 AUC rows as a historical proxy.

Concrete workflows

Monday morning
You: Open /health-lab Pulse → see if any KPI is red.
Then: If monotonicity = Broken or AUC dropped: open /health-lab/predictive to see which decile broke. If a counter-signal fired: open /health-lab/insights.
Trial conversion is dropping
You: Open /health-lab/predictive with cohort=trial, outcome=conversion, horizon=30.
Then: Look at the decile table — if conversion rate is flat across deciles, the formula isn't separating trials. Open /health-lab/pillars to see if 'activation' lost its predictive power.
You want to add a new feature to core_crm
You: Open /health-lab/simulator. Toggle the new feature in 'Core CRM features'. Click Run.
Then: If Δ AUC is positive AND distribution shift is reasonable: Save as candidate. Track its AUC for 30 days on /health-lab/versions before promoting.
A CSM says 'this 78-score account just churned'
You: Open /health-lab/cohorts to see if their segment (e.g. sub_small) is non-monotone in current data.
Then: If yes, this is a known segment weakness — flag it in Insights or test a candidate that adjusts weights for that segment.

The 7 tabs in detail

Live (warming up)

Pulse

Is the formula in good shape today?

What you see: 4 KPIs (AUC churn, AUC conversion, bucket monotonicity, outcome coverage) + 12-week AUC trend chart.

When to use: Mondays. 30 seconds to know if the formula has drifted vs the prior month.

Open Pulse →
Live

Predictive Power

Does the score actually predict the outcome?

What you see: Decile table (10 score buckets × outcome rate, should be monotone), calibration scatter (predicted vs observed), confusion matrix at the at-risk threshold (< 25).

When to use: When AUC moves and you need to know which range of scores is mis-predicting.

Open Predictive Power →
Live (warming up)

Pillar Lens

Which of the 5 pillars carries the signal?

What you see: Per-pillar Pearson correlation + permutation importance (AUC drop when this pillar is randomized), hero features (positive lift), counter-signals (negative lift).

When to use: When you suspect a pillar is dead weight, or to find emerging hero/counter features (recreates the archived analysis live).

Open Pillar Lens →
Live (warming up)

Cohort Lens

Do thresholds work across all segments?

What you see: Heatmap bucket × segment (sandbox / trial / sub_small / sub_mid / sub_large) with outcome rate per cell.

When to use: To spot non-monotone segments (e.g. healthy sandboxes that churn anyway, at-risk multi-seat that don't).

Open Cohort Lens →
Live

What-If Simulator ⭐

What would happen if we changed the formula?

What you see: Sliders for the 5 pillar weights, saturation thresholds, core CRM feature set. After Run: AUC canonical vs candidate, distribution shift, top movers (locations whose score changes most).

When to use: Any time you want to test a hypothesis. Replaces the manual SQL recalibration runbook.

Open What-If Simulator ⭐ →
Live

Auto-Insights

What did the system notice on its own?

What you see: Reverse-chronological feed of drifts, new signals, counter-signals, segment breaks, anomalies. Severity (info / warn / critical) + status (open / acked / dismissed / actioned).

When to use: Mondays after the weekly audit (A.12) runs. Triage what needs action.

Open Auto-Insights →
Live

Versions

What formulas have we tried, and how did they perform?

What you see: Timeline of all formulas (active / candidate / shadow / retired) with their latest AUC + the JSON of their parameters.

When to use: When promoting a candidate to active, or auditing why we changed weights last quarter.

Open Versions →
Data ramp-up

When will v3 AUC be available?

The 60-day outcome maturity window explains why Pulse is 'warming up'.

WhenWhat becomes available
Todayv1 legacy has 42 AUC rows (different formula, kept as historical proxy). v3 has 0 AUC because it shipped 2026-04-20 and outcomes mature 60 days later.
~2026-05-05 (J+14)First v3 score_day with mature 60d outcome (2026-03-06 + 60). AUC starts being non-null but very noisy (N≈1 day).
~2026-06-04 (J+44)30 days of v3 outcomes — first stable AUC measurement. Pulse trend chart starts being meaningful.
~2026-07-04 (J+74)60 days of mature v3 outcomes — full rolling window. AUC becomes robust.

Want signal sooner? Use horizon=30 + outcome=conversion on the trial cohort — trial conversion settles in 14-30 days, much faster than churn.

Reference

Glossary — stats terms used in the Lab

AUC, calibration, lift, monotonicity… in plain language.

AUC (area under ROC)
Probability that, for two random locations (one with the outcome, one without), the formula gives a higher score to the one without. 0.5 = random. 0.7 = useful. 0.8+ = excellent. For 'churn', the displayed value is flipped (1 - raw AUC) so higher always means better.
Pearson correlation
Linear correlation between score and outcome (binary 0/1). Range -1 to 1. Positive when high score correlates with the outcome occurring.
Calibration
When the predicted probability matches the observed rate. A score of 70 should mean ~70% chance of the outcome. Visualised as a scatter with a y=x reference — the closer to the diagonal, the better calibrated.
Bucket monotonicity
For churn: at_risk should churn more than steady, which should churn more than healthy, which should churn more than thriving. If broken, the buckets aren't doing their job.
Permutation importance
AUC drop when a single pillar is replaced by random noise. Big drop = pillar carries real signal. Near-zero drop = pillar is dead weight. Negative = pillar adds noise.
Lift (feature lift)
(outcome rate among feature users) - (outcome rate among non-users), in percentage points. +9pt for 'contacts' on conversion means trial users who touched contacts converted 9pts more than those who didn't. Negative lift = the feature predicts the OPPOSITE of what we'd expect.
Horizon (60d, 90d)
How many days after the score we wait before declaring an outcome (churn / converted / etc). Longer horizons are more reliable but slower to materialise.
Cohort filter
no_sandbox = excludes MRR ≤ 2€ test accounts (default). all = everything. trial / subscription = lifecycle filter. sandbox = only the test accounts.
Behind the scenes

Crons & audit checks

What runs nightly to keep the Lab fresh.

JobScheduleRole
aggregate_feature_30d_034503:45 Paris dailyRefresh feature_daily_30d_agg + prune > 30d (perf cache for the simulator)
health_outcomes_backfill_040004:00 Paris dailyCompute outcomes for score_day = today-60 + today-90, then aggregate distribution_daily
health_formula_performance_043004:30 Paris dailyAUC + correlation for the active formula × cohorts × outcomes
health_pillar_performance_044504:45 Paris dailyPer-pillar Pearson + permutation importance
health_feature_lift_weekly_0500Monday 05:00 ParisPer-feature lift (hero / counter signals)
A.12 health-formula-driftAudit framework (reliability lane, weekly)Detect AUC drift > 5% / 10%, broken monotonicity, emerging hero/counter features → write into health_insights_log

Trigger A.12 manually: npm run audit:reliability -- --only A.12

Behind the scenes

Versioning: formula_id vs score_version

Why both columns exist on location_health_daily.

formula_id is the new source of truth (catalog in health_formula_versions). score_version is kept in parallel for back-compat (173 existing consumers). Dropping score_version is planned in a follow-up migration once prod has been stable on formula_id for ≥7 days.

Backfill mapping (M106):

  • docs.healthLab.versioning.backfill.v1
  • docs.healthLab.versioning.backfill.v2Legacy
  • docs.healthLab.versioning.backfill.v3
Decision rules

If you see X → do Y

Cause → meaning → action. Print this if you don't trust your memory.

If you seeIt meansDo this
AUC < 0.6Formula barely better than random.Open Pillar Lens — at least one pillar likely has corr ~0. Test removing or de-weighting it in the Simulator.
AUC 0.6 – 0.7OK predictor with room to improve.Try 2-3 candidates in the Simulator. Save the best (Δ AUC ≥ +0.02) as a candidate.
AUC ≥ 0.7Strong predictor — don't touch.Watch Insights weekly. Don't promote candidates unless they're +0.03 better.
Δ AUC ≥ +0.03 in SimulatorReal improvement on the backtest.Click Save as candidate. Wait the 30-day shadow period. If still better in 30 days → promote via migration.
Δ AUC < +0.01Within noise floor. Could be coincidence.Don't save. Either try a bigger change (move a weight by ≥ 5) or accept the formula as-is.
Monotonicity = Broken on PulseBuckets aren't ordered by outcome rate (e.g. healthy churns more than steady).Open Cohort Lens — find the segment that breaks the order. Often sandbox (always exclude) or sub_large (different dynamics that need their own threshold).
Counter-signal in Insights with n ≥ 30, p < 0.1Users of that feature do WORSE than non-users on the outcome.Open Simulator. Untoggle that feature from Core CRM. Re-run. If AUC goes up → save the candidate.
Candidate in Versions with shadow < 30 daysNot promotable yet.Wait until min_shadow_days elapses (default 30). Watch its AUC on Versions until it stabilises ≥ active formula.
Troubleshooting

Why is X empty?

Common 'no data' cases and how to unblock them.

Pulse shows 'warming up'
Cause: No row in health_formula_performance with non-null AUC for the active formula. Either crons haven't run yet, or v3 doesn't have enough mature outcomes (cf timeline above).
Fix: Wait for the nightly cron (04:30 Paris). To force: SELECT public.compute_formula_performance('v3', CURRENT_DATE, 60, 'no_sandbox', 'churn');
/health-lab/pillars shows no feature lift
Cause: compute_feature_lift needs feature_daily history covering the score-day window [eval-90, eval-60]. feature_daily started 2026-03-06.
Fix: Wait until 2026-05-05 for the first overlap. Or change the cohort to one with shorter outcome horizon (trial conversion).
Simulator returns 'insufficient_sample'
Cause: Less than 5 positives or 5 negatives in the cohort × outcome × window combination.
Fix: Widen the cohort (try 'all') or shorten the horizon (30d). The simulator needs both classes to compute AUC.
/health-lab/insights is empty
Cause: A.12 audit check hasn't run yet, or didn't detect anything.
Fix: npm run audit:reliability -- --only A.12 (one-shot). Otherwise it runs as part of the weekly reliability lane.
Versions page shows v3 with no perf
Cause: Same as Pulse — v3 has no mature outcomes yet.
Fix: v1 legacy has perf data because it ran 2026-01-06 → 2026-02-04, which fully overlaps with mature outcomes.

FAQ

FAQ

Why was the Health Lab built?

After v3 shipped (2026-04-20), the recalibration loop was manual: a runbook to re-tune thresholds at J+60, a future logistic regression at J+180. Each step lost momentum. The Lab automates the measure → explain → iterate cycle so we never have to ask 'is the formula still working?' — the answer is on /health-lab Pulse, fresh every morning.

FAQ

Can the Simulator activate a candidate formula by itself?

No. Save as candidate inserts a row with status=candidate and a 30-day shadow period. Promotion to status=active still requires a manual migration (intentional gate — formula activation has CSM and CEO impact).

FAQ

Why min_shadow_days = 0 for v3?

Exception: v3 was promoted directly without the Lab existing. All future candidates inherit the default 30-day shadow (and you can override per candidate).

FAQ

What if an insight is wrong / a false positive?

Mark it dismissed on /health-lab/insights. The thresholds in A.12 (drift > 5%, lift > 4pt counter, > 6pt hero) can be tuned in scripts/audit/checks/a12-health-formula-drift.ts.

FAQ

What about Slack notifications?

Not implemented (decision Q6 of review). The /health-lab/insights feed is the source of truth — read it Mondays.

FAQ

What's the difference vs /summary-health?

/summary-health is the operational dashboard (what's the state of each location TODAY). /health-lab is the meta-dashboard (is the FORMULA itself doing its job?). Same data, different question.

FAQ

Can I add a 6th pillar via the simulator?

No. Adding a pillar changes the formula structure (which is code-defined). Only parameters (weights, saturation, thresholds, core_crm set) are data-driven. New pillars require a migration + code review.

FAQ

How do I delete a candidate I no longer want?

DELETE FROM health_formula_versions WHERE formula_id = '<the_id>' AND status = 'candidate'. The catalog is small enough that direct SQL is fine.

Related features