Calculate net revenue retention and build cohort retention curves in Python — on a realistic MRR movement ledger where the numbers actually behave like a SaaS book. About 35 minutes.
This uses a SaaS MRR movement ledger: one row per new, expansion, contraction, or churn event, with plan, seats and mrr_delta. Plan-dependent churn and expansion mean cohorts decline realistically — so your retention curves have signal.
import pandas as pd
df = pd.read_csv("saas_mrr.csv", parse_dates=["month"])
# Each account's MRR after its latest movement in each month
acct_month = (df.sort_values("month")
.groupby(["account_id", "month"])["mrr"].last()
.reset_index())
waterfall = df.pivot_table(index="month", columns="movement",
values="mrr_delta", aggfunc="sum").fillna(0)
waterfall["net_new_mrr"] = waterfall.sum(axis=1)
waterfall.head()
NRR for a period = (starting MRR + expansion − contraction − churn) ÷ starting MRR, measured on the cohort that existed at the start:
m = waterfall.copy()
exp = m.get("expansion", 0)
con = m.get("contraction", 0)
chn = m.get("churn", 0)
start_mrr = acct_month.groupby("month")["mrr"].sum().shift(1)
nrr = (start_mrr + exp + con + chn) / start_mrr # con & chn are negative
nrr.dropna().tail(12) # trailing-12-month NRR by month
Tag each account by its signup month, then track surviving MRR by months-since-signup:
signup = (df[df.movement=="new"]
.groupby("account_id")["month"].min().rename("cohort"))
am = acct_month.join(signup, on="account_id")
am["period"] = ((am.month.dt.year - am.cohort.dt.year)*12
+ (am.month.dt.month - am.cohort.dt.month))
cohort = am.pivot_table(index=am.cohort.dt.to_period("M"),
columns="period", values="mrr", aggfunc="sum")
retention = cohort.div(cohort[0], axis=0) # vs. month-0 MRR
retention.round(2).head()
Read the cohort triangle left-to-right: each row is a signup cohort, each column a month of tenure. Healthy books hold near 100% (expansion offsetting churn); leaky ones decay fast. Because Starter plans churn far more than Enterprise here, slicing by plan shows dramatically different NRR — exactly the kind of insight a real analysis surfaces.
Accounts sign up across 24 months and face realistic, plan-dependent churn and expansion each month. That means NRR lands in a believable range, cohorts decay at different rates, and segment cuts (plan, region, industry) actually differ — instead of the flat, structureless output you get from independent random rows.