B2B Distribution Invoice Data Generator

Realistic invoice line items for a wholesale/janitorial distributor — driven by customer segments (A–D), relationship momentum, large-buy spikes, and category-specific markups. Perfect as a sample sales file for dashboards, BI demos, and SQL practice.

SeededCSV + ExcelAnomaly labels100% in-browser

Generate the dataset

Save / load scenario (stored only in this browser)

Quick-start presets

What's in this dataset

Each row is one invoice line (one product on one order). The full schema:

ColumnTypeDescription
order_datedateBusiness day the order was placed (weekends excluded).
invoice_nointegerInvoice identifier; multiple lines share one invoice.
customer_id / customerint / textThe buying business.
product_id / productint / textThe SKU ordered.
categorytextOne of five distributor categories (paper, office, cleaning, breakroom, copy paper).
segmenttextThe customer's role for that category: segA (largest) → segD (smallest).
quantityintegerUnits ordered; scales with segment & volume tier.
unit_cost / unit_pricenumberYour cost and the price charged (cost × markup).
revenue / cost / marginnumberLine economics (quantity × price/cost, and the difference).
ship_datedateFulfilment date (mostly next business day).
anomaly0/1Present only when anomaly injection is on; flags inflated-price large orders.

Why it's realistic

Generic mock-data tools draw every cell independently, so quantities, prices, and customers have no relationship. This simulator instead models distribution dynamics: each customer is assigned a segment per category, and segment drives order frequency, order size, and the discount they command. Big accounts (segA) buy more often, in larger quantities, but at thinner markups; convenience-driven small accounts (segD) pay a premium. Relationship momentum means winning one category makes the next more likely, mirroring share-of-wallet stickiness. Occasional large buys create realistic volume spikes at compressed margins. The result behaves like a real sales export — trends, seasonality of buyers, and a long tail of small orders — which is exactly what you need to demo a dashboard or set a take-home that rewards real analysis.

Good for

Power BI / Tableau sales dashboards SQL & pandas practice Data-analyst portfolio projects Margin / pricing analysis demos Anomaly-detection starters Load & ETL testing

FAQ

How many customers and SKUs are generated?

Roughly one customer per 20 rows and one SKU per 50 rows (five categories), so a 5,000-row file has ~250 customers and ~100 products. Both scale with the row count.

Will the same seed give me the same file?

Yes. With a seed set, the exact dataset is reproducible — great for tutorials and shared examples. Leave the seed blank for fresh random data each time.

Is anything uploaded?

No — generation is 100% in your browser.