B2B Distribution Invoice Data Generator — realistic sample sales data (CSV & Excel)

Example output — a peek before you generate

order_date	customer	product	category	quantity	unit_price	revenue
2026-06-01	Sparkling Wholesalers of Lakeville	Fresh 2-Ply Toilet Paper	paper & tissue	240	$18.40	$4,416.00
2026-06-01	Bright Custodians of Mountburg	Citrus Cleaner	cleaning supplies	18	$44.10	$793.80
2026-06-02	Polished Suppliers of Cedarton	Coffee Beans	breakroom supplies	6	$12.75	$76.50
2026-06-02	Gleaming Servicers of Rivercrest	Recycled Paper	office products	90	$9.20	$828.00

A real generated file has 16 columns and up to 200,000 rows; this is a 7-column, 4-row taste of the shape. Set your row count below and click Generate for your own.

Generate the dataset

Number of rows (approx.)

Seed (same seed → identical data; blank = random)

Inject anomalies / labels (adds an anomaly column flagging suspicious large orders)

Save / load scenario (stored only in this browser)

Quick-start presets

Preview

First rows

First 25 rows shown; downloads contain the full dataset. Red rows are injected anomalies.

What's in this dataset

Each row is one invoice line (one product on one order). The full schema:

Column	Type	Description
order_date	date	Business day the order was placed (weekends excluded).
invoice_no	integer	Invoice identifier; multiple lines share one invoice.
customer_id / customer	int / text	The buying business.
product_id / product	int / text	The SKU ordered.
category	text	One of five distributor categories (paper, office, cleaning, breakroom, copy paper).
segment	text	The customer's role for that category: segA (largest) → segD (smallest).
quantity	integer	Units ordered; scales with segment & volume tier.
unit_cost / unit_price	number	Your cost and the price charged (cost × markup).
revenue / cost / margin	number	Line economics (quantity × price/cost, and the difference).
ship_date	date	Fulfilment date (mostly next business day).
anomaly	0/1	Present only when anomaly injection is on; flags inflated-price large orders.

Why it's realistic

Generic mock-data tools draw every cell independently, so quantities, prices, and customers have no relationship. This simulator instead models distribution dynamics: each customer is assigned a segment per category, and segment drives order frequency, order size, and the discount they command. Big accounts (segA) buy more often, in larger quantities, but at thinner markups; convenience-driven small accounts (segD) pay a premium. Relationship momentum means winning one category makes the next more likely, mirroring share-of-wallet stickiness. Occasional large buys create realistic volume spikes at compressed margins. The result behaves like a real sales export — trends, seasonality of buyers, and a long tail of small orders — which is exactly what you need to demo a dashboard or set a take-home that rewards real analysis.

Good for

Power BI / Tableau sales dashboards SQL & pandas practice Data-analyst portfolio projects Margin / pricing analysis demos Anomaly-detection starters Load & ETL testing

FAQ

How many customers and SKUs are generated?

Roughly one customer per 20 rows and one SKU per 50 rows (five categories), so a 5,000-row file has ~250 customers and ~100 products. Both scale with the row count.

Will the same seed give me the same file?

Yes. With a seed set, the exact dataset is reproducible — great for tutorials and shared examples. Leave the seed blank for fresh random data each time.

Is anything uploaded?

No — generation is 100% in your browser.