NRR & cohort analysis with a free SaaS MRR dataset

Calculate net revenue retention and build cohort retention curves in Python — on a realistic MRR movement ledger where the numbers actually behave like a SaaS book. About 35 minutes.

Get the dataset

This uses a SaaS MRR movement ledger: one row per new, expansion, contraction, or churn event, with plan, seats and mrr_delta. Plan-dependent churn and expansion mean cohorts decline realistically — so your retention curves have signal.

Download SaaS MRR data (CSV) → Customize in the generator

Steps

Load & rebuild monthly MRR per account

import pandas as pd
df = pd.read_csv("saas_mrr.csv", parse_dates=["month"])

# Each account's MRR after its latest movement in each month
acct_month = (df.sort_values("month")
                .groupby(["account_id", "month"])["mrr"].last()
                .reset_index())

Compute total MRR & the movement waterfall

waterfall = df.pivot_table(index="month", columns="movement",
                           values="mrr_delta", aggfunc="sum").fillna(0)
waterfall["net_new_mrr"] = waterfall.sum(axis=1)
waterfall.head()

Calculate Net Revenue Retention (NRR)

NRR for a period = (starting MRR + expansion − contraction − churn) ÷ starting MRR, measured on the cohort that existed at the start:

m = waterfall.copy()
exp  = m.get("expansion", 0)
con  = m.get("contraction", 0)
chn  = m.get("churn", 0)
start_mrr = acct_month.groupby("month")["mrr"].sum().shift(1)

nrr = (start_mrr + exp + con + chn) / start_mrr   # con & chn are negative
nrr.dropna().tail(12)   # trailing-12-month NRR by month

Build cohort retention

Tag each account by its signup month, then track surviving MRR by months-since-signup:

signup = (df[df.movement=="new"]
            .groupby("account_id")["month"].min().rename("cohort"))
am = acct_month.join(signup, on="account_id")
am["period"] = ((am.month.dt.year - am.cohort.dt.year)*12
                + (am.month.dt.month - am.cohort.dt.month))

cohort = am.pivot_table(index=am.cohort.dt.to_period("M"),
                        columns="period", values="mrr", aggfunc="sum")
retention = cohort.div(cohort[0], axis=0)   # vs. month-0 MRR
retention.round(2).head()

Interpret

Read the cohort triangle left-to-right: each row is a signup cohort, each column a month of tenure. Healthy books hold near 100% (expansion offsetting churn); leaky ones decay fast. Because Starter plans churn far more than Enterprise here, slicing by plan shows dramatically different NRR — exactly the kind of insight a real analysis surfaces.

Why this dataset has signal

Accounts sign up across 24 months and face realistic, plan-dependent churn and expansion each month. That means NRR lands in a believable range, cohorts decay at different rates, and segment cuts (plan, region, industry) actually differ — instead of the flat, structureless output you get from independent random rows.

Keep going

Power BI sales dashboard tutorial Market-basket analysis tutorial SaaS churn dataset (labeled) Sample data for cohort analysis