Sample data for Snowflake

A clean, well-typed sales dataset ready to stage and COPY INTO Snowflake — realistic enough to demo warehouses, transformations and BI on top.

B2B distributionSeeded - reproducibleCSV / Excel / JSON / SQL100% in-browser

Generate & download

Save / load scenario (stored only in this browser)

About this dataset

This is a free, reproducible B2B distribution dataset you can generate and download right here as CSV, Excel, JSON or SQL. It is built for snowflake demos, warehouse loading and dbt / transformation practice — and because every field is correlated rather than random, the numbers actually hold together when you analyze them.

Each customer is assigned a segment per product category that drives how often and how much they buy, with relationship momentum, occasional large-buy spikes, and category-specific markups — so the file behaves like a real distributor sales export, not random noise.

Columns in this dataset

Schema for the B2B distribution export (the anomaly column appears only when labels are switched on):

ColumnTypeDescription
order_datedateBusiness day the order was placed.
invoice_nointegerInvoice id; multiple lines share one invoice.
customer_id / customerint / textThe buying business.
product_id / productint / textThe SKU ordered.
categorytextOne of five distributor categories.
segmenttextCustomer role for that category (segA largest to segD smallest).
quantityintegerUnits ordered; scales with segment and volume.
unit_cost / unit_pricenumberYour cost and the price charged.
revenue / cost / marginnumberLine economics.
ship_datedateFulfilment date (usually next business day).
anomaly0/1Present with labels on; flags inflated-price large orders.

Load into Snowflake

CREATE TABLE b2b_invoices ( /* see Download SQL for full DDL */ );
PUT file://b2b_invoices.csv @%b2b_invoices;
COPY INTO b2b_invoices FROM @%b2b_invoices
  FILE_FORMAT = (TYPE=CSV SKIP_HEADER=1);

Good for

Snowflake demosWarehouse loadingdbt / transformation practiceBI on the warehouse

Related sample datasets

FAQ

How big is this dataset?

Around 10,000 rows by default. Change the row count in the generator above and re-export — anything up to ~200k works in the browser.

What formats can I download?

CSV, Excel (.xlsx), JSON, and SQL (a CREATE TABLE plus INSERT statements). Pick whatever fits your workflow.

Will I get the same file every time?

Yes. This page uses the fixed seed snowflake-demo, so the download is byte-identical on every machine. Clear the seed in the generator for fresh random data.

Can I get separate tables, messy data, or other formats?

Yes. Use Tables → Excel/SQL for a normalized multi-table export, switch on Messy / dirty data in Advanced options for nulls, typos and inconsistent dates, and choose CSV, Excel, JSON or SQL on any download.

Is the data real?

No — it is 100% synthetic, generated in your browser, with no real people or companies. Free to use commercially.