A realistic SaaS dataset with missing values scattered through it (plus typos and duplicates) — ideal for practicing imputation, validation and missing-data strategies.
This is a free, reproducible SaaS / MRR dataset you can generate and download right here as CSV, Excel, JSON or SQL. It is built for missing-value imputation, data validation and cleaning pipelines — and because every field is correlated rather than random, the numbers actually hold together when you analyze them.
Accounts sign up across the last two years on weighted plan tiers, then face a plan-dependent churn hazard with chances of expansion and contraction each month — reproducing the real shape of a SaaS book (leaky low end, sticky enterprise) so retention math has signal.
Schema for the SaaS / MRR export (the anomaly column appears only when labels are switched on):
| Column | Type | Description |
|---|---|---|
| month | date | First of the month the movement occurred. |
| account_id / account | int / text | The subscribing company. |
| movement | text | new, expansion, contraction, or churn. |
| plan | text | Starter / Pro / Business / Enterprise. |
| seats | integer | Active seats after the movement (0 on churn). |
| mrr | number | Account MRR after the movement. |
| mrr_delta | number | Change in MRR (negative for contraction/churn). |
| region / industry | text | Firmographic dimensions. |
| anomaly | 0/1 | Present with labels on; flags suspicious churn. |
import pandas as pd
df = pd.read_csv("saas_mrr.csv")
df.head()
Around 8,000 rows by default. Change the row count in the generator above and re-export — anything up to ~200k works in the browser.
CSV, Excel (.xlsx), JSON, and SQL (a CREATE TABLE plus INSERT statements). Pick whatever fits your workflow.
Yes. This page uses the fixed seed missing-demo, so the download is byte-identical on every machine. Clear the seed in the generator for fresh random data.
Yes. Use Tables → Excel/SQL for a normalized multi-table export, switch on Messy / dirty data in Advanced options for nulls, typos and inconsistent dates, and choose CSV, Excel, JSON or SQL on any download.
No — it is 100% synthetic, generated in your browser, with no real people or companies. Free to use commercially.