Sample data for data warehouse

A fact-style sales dataset with clean dimensions (customer, product, date) — ideal for star-schema and dimensional-modeling practice.

B2B distributionSeeded - reproducibleCSV / Excel / JSON / SQL100% in-browser

Generate & download

Save / load scenario (stored only in this browser)

About this dataset

This is a free, reproducible B2B distribution dataset you can generate and download right here as CSV, Excel, JSON or SQL. It is built for dimensional modeling, star schemas and dbt practice — and because every field is correlated rather than random, the numbers actually hold together when you analyze them.

Each customer is assigned a segment per product category that drives how often and how much they buy, with relationship momentum, occasional large-buy spikes, and category-specific markups — so the file behaves like a real distributor sales export, not random noise.

Columns in this dataset

Schema for the B2B distribution export (the anomaly column appears only when labels are switched on):

ColumnTypeDescription
order_datedateBusiness day the order was placed.
invoice_nointegerInvoice id; multiple lines share one invoice.
customer_id / customerint / textThe buying business.
product_id / productint / textThe SKU ordered.
categorytextOne of five distributor categories.
segmenttextCustomer role for that category (segA largest to segD smallest).
quantityintegerUnits ordered; scales with segment and volume.
unit_cost / unit_pricenumberYour cost and the price charged.
revenue / cost / marginnumberLine economics.
ship_datedateFulfilment date (usually next business day).
anomaly0/1Present with labels on; flags inflated-price large orders.

Create the table & load

-- Easiest: click Download SQL above for CREATE TABLE + INSERTs.
-- Or load the CSV with your DB's bulk-import command.

Good for

Dimensional modelingStar schemasdbt practiceWarehouse demos

Related sample datasets

FAQ

How big is this dataset?

Around 15,000 rows by default. Change the row count in the generator above and re-export — anything up to ~200k works in the browser.

What formats can I download?

CSV, Excel (.xlsx), JSON, and SQL (a CREATE TABLE plus INSERT statements). Pick whatever fits your workflow.

Will I get the same file every time?

Yes. This page uses the fixed seed dwh-demo, so the download is byte-identical on every machine. Clear the seed in the generator for fresh random data.

Can I get separate tables, messy data, or other formats?

Yes. Use Tables → Excel/SQL for a normalized multi-table export, switch on Messy / dirty data in Advanced options for nulls, typos and inconsistent dates, and choose CSV, Excel, JSON or SQL on any download.

Is the data real?

No — it is 100% synthetic, generated in your browser, with no real people or companies. Free to use commercially.