NRR & cohort analysis with a free SaaS MRR dataset

Calculate net revenue retention and build cohort retention curves in Python — on a realistic MRR movement ledger where the numbers actually behave like a SaaS book. About 35 minutes.

Get the dataset

This uses a SaaS MRR movement ledger: one row per new, expansion, contraction, or churn event, with plan, seats and mrr_delta. Plan-dependent churn and expansion mean cohorts decline realistically — so your retention curves have signal.

Download SaaS MRR data (CSV) → Customize in the generator

Steps

  1. Load & rebuild monthly MRR per account

    import pandas as pd
    df = pd.read_csv("saas_mrr.csv", parse_dates=["month"])
    
    # Each account's MRR after its latest movement in each month
    acct_month = (df.sort_values("month")
                    .groupby(["account_id", "month"])["mrr"].last()
                    .reset_index())
  2. Compute total MRR & the movement waterfall

    waterfall = df.pivot_table(index="month", columns="movement",
                               values="mrr_delta", aggfunc="sum").fillna(0)
    waterfall["net_new_mrr"] = waterfall.sum(axis=1)
    waterfall.head()
  3. Calculate Net Revenue Retention (NRR)

    NRR for a period = (starting MRR + expansion − contraction − churn) ÷ starting MRR, measured on the cohort that existed at the start:

    m = waterfall.copy()
    exp  = m.get("expansion", 0)
    con  = m.get("contraction", 0)
    chn  = m.get("churn", 0)
    start_mrr = acct_month.groupby("month")["mrr"].sum().shift(1)
    
    nrr = (start_mrr + exp + con + chn) / start_mrr   # con & chn are negative
    nrr.dropna().tail(12)   # trailing-12-month NRR by month
  4. Build cohort retention

    Tag each account by its signup month, then track surviving MRR by months-since-signup:

    signup = (df[df.movement=="new"]
                .groupby("account_id")["month"].min().rename("cohort"))
    am = acct_month.join(signup, on="account_id")
    am["period"] = ((am.month.dt.year - am.cohort.dt.year)*12
                    + (am.month.dt.month - am.cohort.dt.month))
    
    cohort = am.pivot_table(index=am.cohort.dt.to_period("M"),
                            columns="period", values="mrr", aggfunc="sum")
    retention = cohort.div(cohort[0], axis=0)   # vs. month-0 MRR
    retention.round(2).head()
  5. Interpret

    Read the cohort triangle left-to-right: each row is a signup cohort, each column a month of tenure. Healthy books hold near 100% (expansion offsetting churn); leaky ones decay fast. Because Starter plans churn far more than Enterprise here, slicing by plan shows dramatically different NRR — exactly the kind of insight a real analysis surfaces.

Why this dataset has signal

Accounts sign up across 24 months and face realistic, plan-dependent churn and expansion each month. That means NRR lands in a believable range, cohorts decay at different rates, and segment cuts (plan, region, industry) actually differ — instead of the flat, structureless output you get from independent random rows.

Keep going